Need Help?

Somatic mutation and selection at epidemiological scale - Sanger_NanoSeq_RandD

Bottleneck sequencing of human tissue including neurons, cord blood, sperm This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/. As we age, many tissues become colonised by microscopic clones carrying somatic driver mutations. Some of these clones represent a first step towards cancer whereas others may contribute to ageing and other diseases. However, our understanding of the clonal landscapes of human tissues, and their impact on cancer risk, ageing and disease, remains limited due to the challenge of detecting somatic mutations present in small numbers of cells. Here, we introduce a new version of nanorate sequencing (NanoSeq), a duplex sequencing method with error rates of less than 5 per billion base pairs, which is compatible with whole-exome and targeted gene sequencing. Deep sequencing of polyclonal samples with single-molecule sensitivity enables the simultaneous detection of mutations in large numbers of clones, yielding accurate somatic mutation rates, mutational signatures and driver mutation frequencies in any tissue. Applying targeted NanoSeq to 1,042 non-invasive samples of oral epithelium and 371 samples of blood from a twin cohort, we found an unprecedentedly rich landscape of selection, with 46 genes under positive selection driving clonal expansions in the oral epithelium, over 62,000 driver mutations, and evidence of negative selection in some genes. The high number of positively selected mutations in multiple genes provides high-resolution maps of selection across coding and non-coding sites, a form of in vivo saturation mutagenesis. Multivariate regression models enable mutational epidemiology studies on how carcinogenic exposures and cancer risk factors, such as age, tobacco or alcohol, alter the acquisition and selection of somatic mutations. Accurate single-molecule sequencing has the potential to unveil the polyclonal landscape of any tissue, providing a powerful tool to study early carcinogenesis, cancer prevention and the role of somatic mutations in ageing and disease.

Request Access

Wellcome Trust Sanger Institute Data Sharing Policy

Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.

Study ID Study Title Study Type
EGAS00001004066 Cancer Genomics

This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.

ID File Type Size Quality Report
Located in
EGAF00008736986 cram 14.8 GB
EGAF00008736987 cram 15.5 GB
EGAF00008736988 cram 16.3 GB
EGAF00008736989 cram 12.5 GB
EGAF00008736990 cram 14.2 GB
EGAF00008736991 cram 17.5 GB
EGAF00008736992 cram 19.9 GB
EGAF00008736993 cram 14.5 GB
EGAF00008736994 cram 14.8 GB
EGAF00008736995 cram 14.8 GB
EGAF00008736996 cram 7.5 GB
EGAF00008736997 cram 8.9 GB
EGAF00008736998 cram 8.8 GB
EGAF00008736999 cram 6.8 GB
EGAF00008737000 cram 7.4 GB
EGAF00008737001 cram 8.3 GB
EGAF00008737002 cram 6.2 GB
EGAF00008737003 cram 6.1 GB
EGAF00008737004 cram 5.0 GB
EGAF00008737005 cram 6.0 GB
EGAF00008737006 cram 8.0 GB
EGAF00008737007 cram 7.3 GB
EGAF00008737008 cram 8.5 GB
EGAF00008737009 cram 6.3 GB
EGAF00008737010 cram 7.0 GB
EGAF00008737011 cram 7.3 GB
EGAF00008737012 cram 7.6 GB
EGAF00008737013 cram 6.2 GB
EGAF00008737014 cram 6.7 GB
EGAF00008737015 cram 6.5 GB
EGAF00008737016 cram 6.1 GB
EGAF00008737017 cram 5.7 GB
32 Files (308.7 GB)