Study

A comprehensive assessment of somatic mutation detection in cancer using whole genome sequencing

Study ID Alternative Stable ID Type
EGAS00001001539 Other

Study Description

As whole genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Using tumor-normal sample pairs from two different types of cancer, chronic lymphocytic leukemia and medulloblastoma, we conducted a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines, and even validation methods. Here we show that using PCR-free methods and increasing sequencing depth to ~100x showed benefits, as long as the tumor:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artifact-prone nature of the raw data and lack of standards for dealing with the artifacts. However, armed with the benchmark mutation set we have created, we show that many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy.

Study Datasets 2 datasets.

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
EGAD00001001858
Raw fastq files from WGS sequencing of CLL and matching blood normal for the ICGC Techval Benchmark1 study. Sequence data was provided to multiple centers for independent analysis and comparison.
Illumina HiSeq 2500 2
EGAD00001001859
Raw fastq files for sequence data generated at 5 sequencing centers from a Medulloblastoma sample and matching blood normal control.
Illumina HiSeq 2500 2

Who archives the data?

Publications

Citations

Retrieving...
Retrieving...
Retrieving...
Retrieving...
Retrieving...
Retrieving...
Retrieving...
Retrieving...
Retrieving...
Retrieving...