Targeted sequencing of genes recurrently mutated in AML

Background Massively parallel sequencing technology has transformed cancer genomics. It is now feasible, in a clinically relevant time-frame, for a clinically manageable cost, to screen DNA from patient tumours for mutations essentially genome-wide. The challenge for personalised medicine will be to increase the sample size to thousands or tens of thousands of well-characterised cases in order to attain sufficient statistical power to stratify patients accurately across the complexity and genomic heterogeneity expected for most of the common tumour types. Currently, whole genome sequencing on this scale is not feasible, and targeted sequencing of relevant portions of the genome will be required. Pilot data We have developed protocols for large-scale, multiplexed sequencing of 100-200 genes in thousands of samples. Essentially, using robotic technology, genomic DNA from the cancer specimen is processed into sequencing libraries with unique DNA barcodes, thereby allowing sequencing reads to be attributed to the sample they derive from. Currently, these sequencing libraries can be generated in a 96-well format using fully automated protocols, and we are exploring methods to expand this to a 384-well format. The sequencing libraries are pooled and hybridized to custom sets of RNA baits representing the genomic regions of interest. Sequencing of the pulled-down libraries is done in pools of 48-96 samples per lane of an Illumina Hi-Seq. This protocol is already implemented at the Sanger Institute. We have published proof that somatic mutations in novel cancer genes can be identified from exome-wide sequencing. In unpublished pilot data, we have established the feasibility of robotic library production, custom pull-down, and multiplexed sequencing of barcoded libraries for 100 known myeloid cancer genes across 760 myelodysplasia samples. Highlights of the data thus far analysed reveal that the coverage is remarkably even between samples; when 96 samples are run, average coverage per lane of sequencing is ~250, with 90-95% of targeted exons covered by >25 reads; known mutations can be discovered in the data set; and the protocol is amenable to whole genome amplified DNA. The bioinformatic algorithms for identification of substitutions and indels in pull-down data are well-established; we have pilot data proving that copy number changes, LOH and genomic rearrangements in specific regions of interest can also be identified by tiling of baits across the relevant loci. Proposal We propose to apply this methodology to 10000 samples from patients with AML enrolled in clinical trials over the last 10-20 years. Oncogenic point mutations and potentially genomic rearrangements will be identified, and linked to clinical outcome data, with a view to undertaking the following sorts of analyses: ? Identification of co-occurrence, mutual exclusivity and clusters of driver mutations. ? Correlation of prognosis with driver mutations and potentially gene-gene interactions ? Exploration of genomic markers of drug response Ultimately, we would like to be in a position to release the mutation data together with matched clinical outcome data to genuine medical researchers via a controlled access approach, possibly within the COSMIC framework (www.sanger.ac.uk/genetics/CGP/cosmic/). The vision here is to generate a portal whereby a clinician faced with an AML patient and his / her mutational profile can obtain a ?personalised? prediction of outcome, together with a fair assessment of the uncertainty of the estimate. With a sufficient sample size, there would also be the potential to develop decision support algorithms for therapeutic choices based on such data.

26/05/2015
38 samples
DAC: EGAC00001000000
Technology: Illumina MiSeq

Archive: European Genome-phenome Archive (EGA)

Access Policy1 Study Files

Metadata

Request Access

DUO:0000019
version: 2021-02-23

publication required

This data use modifier indicates that requestor agrees to make results of studies using the data available to the larger scientific community.

DUO:0000026
version: 2021-02-23

user specific restriction

This data use modifier indicates that use is limited to use by approved users.

DUO:0000028
version: 2021-02-23

institution specific restriction

This data use modifier indicates that use is limited to use within an approved institution.

DUO:0000042
version: 2021-02-23

general research use

This data use permission indicates that use is allowed for general research use for any research purpose.

Wellcome Trust Sanger Institute Cancer Genome Group Data Sharing Policy

Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.

Study ID	Study Title	Study Type
EGAS00001000408	Targeted_sequencing_of_genes_recurrently_mutated_in_AML	Cancer Genomics

This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.

ID	File Type	Size	Quality Report
EGAF00000224486	bam	96.6 MB	Report
EGAF00000224487	bam	94.9 MB	Report
EGAF00000224488	bam	90.0 MB	Report
EGAF00000224489	bam	98.6 MB	Report
EGAF00000224490	bam	82.4 MB	Report
EGAF00000224491	bam	94.6 MB	Report
EGAF00000224492	bam	90.9 MB	Report
EGAF00000224493	bam	97.8 MB	Report
EGAF00000224494	bam	95.6 MB	Report
EGAF00000224495	bam	83.7 MB	Report
EGAF00000224496	bam	85.0 MB	Report
EGAF00000224497	bam	92.5 MB	Report
EGAF00000224498	bam	96.4 MB	Report
EGAF00000224499	bam	88.8 MB	Report
EGAF00000224500	bam	86.3 MB	Report
EGAF00000224501	bam	94.2 MB	Report
EGAF00000224502	bam	226.9 MB	Report
EGAF00000224503	bam	248.3 MB	Report
EGAF00000224504	bam	258.4 MB	Report
EGAF00000224505	bam	199.1 MB	Report
EGAF00000224506	bam	209.2 MB	Report
EGAF00000224507	bam	211.0 MB	Report
EGAF00000224508	bam	77.9 MB	Report
EGAF00000224509	bam	84.9 MB	Report
EGAF00000224510	bam	69.9 MB	Report
EGAF00000224511	bam	73.7 MB	Report
EGAF00000224512	bam	79.8 MB	Report
EGAF00000224513	bam	83.5 MB	Report
EGAF00000224514	bam	72.3 MB	Report
EGAF00000224515	bam	72.5 MB	Report
EGAF00000224516	bam	87.8 MB	Report
EGAF00000224517	bam	86.1 MB	Report
EGAF00000224518	bam	89.8 MB	Report
EGAF00000224519	bam	86.4 MB	Report
EGAF00000224520	bam	79.3 MB	Report
EGAF00000224521	bam	94.5 MB	Report
EGAF00000224522	bam	62.7 MB	Report
EGAF00000224523	bam	73.1 MB	Report
EGAF00000224524	bam	95.6 MB	Report
EGAF00000224525	bam	94.3 MB	Report
EGAF00000224526	bam	90.5 MB	Report
EGAF00000224527	bam	97.1 MB	Report
EGAF00000224528	bam	81.6 MB	Report
EGAF00000224529	bam	93.7 MB	Report
EGAF00000224530	bam	90.7 MB	Report
EGAF00000224531	bam	96.6 MB	Report
EGAF00000224532	bam	94.6 MB	Report
EGAF00000224533	bam	83.6 MB	Report
EGAF00000224534	bam	84.7 MB	Report
EGAF00000224535	bam	91.4 MB	Report
EGAF00000224536	bam	95.1 MB	Report
EGAF00000224537	bam	88.7 MB	Report
EGAF00000224538	bam	85.6 MB	Report
EGAF00000224539	bam	93.4 MB	Report
EGAF00000224540	bam	224.7 MB	Report
EGAF00000224541	bam	246.3 MB	Report
EGAF00000224542	bam	256.6 MB	Report
EGAF00000224543	bam	198.3 MB	Report
EGAF00000224544	bam	209.1 MB	Report
EGAF00000224545	bam	208.6 MB	Report
EGAF00000224546	bam	76.9 MB	Report
EGAF00000224547	bam	83.1 MB	Report
EGAF00000224548	bam	68.8 MB	Report
EGAF00000224549	bam	72.5 MB	Report
EGAF00000224550	bam	78.6 MB	Report
EGAF00000224551	bam	82.6 MB	Report
EGAF00000224552	bam	71.8 MB	Report
EGAF00000224553	bam	71.9 MB	Report
EGAF00000224554	bam	86.6 MB	Report
EGAF00000224555	bam	85.5 MB	Report
EGAF00000224556	bam	88.6 MB	Report
EGAF00000224557	bam	85.7 MB	Report
EGAF00000224558	bam	77.6 MB	Report
EGAF00000224559	bam	93.5 MB	Report
EGAF00000224560	bam	61.6 MB	Report
EGAF00000224561	bam	72.3 MB	Report
76 Files (8.2 GB)