Copied to clipboard!

Quantification of chromosomal copy number aberrations by shallow whole-genome sequencing

Detection of DNA copy number aberrations by shallow whole-genome sequencing (WGS) faces many challenges including lack of completion and errors in the human reference genome, repetitive sequences, polymorphisms, variable sample quality, and biases in the sequencing procedures.Formalin-fixed paraffin-embedded (FFPE) archival material, the analysis of which is important for studies of cancer, presents particular analytical difficulties due to degradation of the DNA and frequent lack of matched reference samples. We present a robust, cost-effective WGS method for DNA copy number analysis that addresses these challenges more successfully than currently available procedures. In practice very useful profiles can be obtained with 0.1 fold genome coverage. We improve on previous methods by; first, implementing a combined correction for sequence mappability and GC content, and second, applying this procedure to sequence data from the 1000 Genomes Project in order to develop a blacklist of problematic genome regions. A small subset of these blacklisted regions were previously identified by ENCODE, but the vast majority are novel unappreciated problematic regions. Our procedures are implemented in a pipeline called QDNAseq. We have analyzed over 1,000 samples, most of which were obtained from the fixed tissue archives of over 25 institutions.We demonstrate that for most samples our sequencing and analysis procedures yield genome profiles with noise levels near the statistical limit imposed by read counting. The described procedures also provide better correction of artifacts introduced by low DNA quality than prior approaches, and better copy number data than high-resolution microarrays at substantially lower cost.

Type: Other
Archive: European Genome-phenome Archive (EGA)

1 Dataset 1 Publication

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID	Description	Technology	Samples
EGAD00001000780	None	Illumina HiSeq 2000	18

Publications	Citations
DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly. Scheinin I, Sie D, Bengtsson H, van de Wiel MA, Olshen AB, van Thuijl HF, van Essen HF, Eijk PP, Rustenburg F, Meijer GA, Reijneveld JC, Wesseling P, Pinkel D, Albertson DG, Ylstra B. Genome Res 24: 2014 2022-2032	432