RNA sequencing and Illumina 2.5M SNP array data collected from 675 commonly used human cancer cell lines.

Tumor-derived cell lines have served as vital models to advance our understanding of oncogene function and therapeutic response1. Although substantial effort has been directed to defining the genomic constitution of cancer cell line panels2–4, the transcriptome – which represents the active program of a cell – remains understudied. Here, we describe RNA sequencing and SNP array analysis of 675 commonly used human cancer cell lines. We explore numerous transcriptome features including coding and non-coding gene expression, transcribed mutations, gene fusion and expression of non-human sequences. Aside from many known aberrations we find new surprising characteristics, including more than 2200 unique fusion gene pairs representing a vast, testable repertoire of oncogenic fusions, many of which have analogs found in primary human tumors. We show that a combination of multiple genome and transcriptome features in a novel pathway-based approach enhances prediction of response to various targeted therapeutics. Our results provide valuable new insights into these critical pre-clinical models and provide added context for interpreting the numerous studies that employ these widely used cell lines.

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
EGAD00001000725 Illumina HiSeq 2000 675
EGAD00001001013 Illumina HiSeq 2000 30
EGAD00010000951 Illumina 2.5M 668
