Need Help?


The UK10K project proposes a series of complementary genetic approaches to find new low-frequency/rare variants contributing to disease phenotypes. These will be based on obtaining the genome-wide sequence of 4000 samples from the TwinsUK and ALSPAC cohorts (at 6x sequence coverage), and the exome sequence (protein-coding regions and related conserved sequence) of 6000 samples selected for extreme phenotypes. Our studies will focus primarily on cardiovascular-related quantitative traits, obesity and related metabolic traits, neurodevelopmental disorders and a limited number of extreme clinical phenotypes that will provide proof-of-concept for future familial trait sequencing. We will directly analyse quantitative traits in the cohorts and the selected traits in the extreme samples, and also use imputation down to 0.1% allele frequency to extend the analyses to further sample sets with genome wide genotype data. In each case we will investigate indels and larger structural variants as well as SNPs, and use statistical methods that combine rare variants in a locus or pathway as well as single-variant approaches. The Avon Longitudinal Study of Parents and Children (ALSPAC) is a two-generation prospective study. Pregnant women living in one of three health districts in the former county of Avon with an expected delivery date between April 1991 and December 1992 were eligible to be enrolled in the study, and this formed the initial point of contact for the development of a large, family based resource. Information has been collected on the children and the mothers through retrieval of biological materials (e.g. antenatal blood samples, placentas), biological sampling (e.g. collection of cord blood, umbilical cord, milk teeth, hair, toenails, blood and urine), self-administered questionnaires, data extraction from medical notes, linkage to routine information systems and at repeat research clinics.

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
EGAD00001000195 Illumina HiSeq 2000 740
EGAD00001000740 Illumina HiSeq 2000 2307
EGAD00001000789 1927
Publications Citations
Detecting identity by descent and estimating genotype error rates in sequence data.
Am J Hum Genet 93: 2013 840-851
A rare functional cardioprotective APOC3 variant has risen in frequency in distinct population isolates.
Nat Commun 4: 2013 2872
Estimating genome-wide significance for whole-genome sequencing studies.
Genet Epidemiol 38: 2014 281-290
Whole-genome sequence-based analysis of thyroid function.
Nat Commun 6: 2015 5681
Common variation at 1q24.1 (ALDH9A1) is a potential risk factor for renal cancer.
PLoS One 10: 2015 e0122589
An interactive genome browser of association results from the UK10K cohorts project.
Bioinformatics 31: 2015 4029-4031
Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture.
Nature 526: 2015 112-117
The UK10K project identifies rare variants in health and disease.
Nature 526: 2015 82-90
Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel.
Nat Commun 6: 2015 8111
Genome-wide association study identifies multiple susceptibility loci for glioma.
Nat Commun 6: 2015 8559
The 9p21.3 risk of childhood acute lymphoblastic leukaemia is explained by a rare high-impact variant in CDKN2A.
Sci Rep 5: 2015 15065
A Protein Domain and Family Based Approach to Rare Variant Association Analysis.
PLoS One 11: 2016 e0153803
Genome-wide association study identifies multiple susceptibility loci for multiple myeloma.
Nat Commun 7: 2016 12050
Discovery and refinement of genetic loci associated with cardiometabolic risk using dense imputation maps.
Nat Genet 48: 2016 1303-1312
A genome-wide association study identifies risk loci for childhood acute lymphoblastic leukemia at 10q26.13 and 12q23.1.
Leukemia 31: 2017 573-579
Dominant KCNA2 mutation causes episodic ataxia and pharmacoresponsive epilepsy.
Neurology 87: 2016 1975-1984
Function-driven discovery of disease genes in zebrafish using an integrated genomics big data resource.
Nucleic Acids Res 44: 2016 9611-9623
A combined reference panel from the 1000 Genomes and UK10K projects improved rare variant imputation in European and Chinese samples.
Sci Rep 6: 2016 39313
Genome-wide association analysis implicates dysregulation of immunity genes in chronic lymphocytic leukaemia.
Nat Commun 8: 2017 14175
Genome-wide association study of glioma subtypes identifies specific differences in genetic susceptibility to glioblastoma and non-glioblastoma tumors.
Nat Genet 49: 2017 789-794
GeneImp: Fast Imputation to Large Reference Panels Using Genotype Likelihoods from Ultralow Coverage Sequencing.
Genetics 206: 2017 91-104
Using Mendelian randomization to determine causal effects of maternal pregnancy (intrauterine) exposures on offspring outcomes: Sources of bias and methods for assessing them.
Wellcome Open Res 2: 2017 11
Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data.
Genome Biol 18: 2017 86
Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits.
Am J Hum Genet 100: 2017 865-884
The impact of rare variation on gene expression across tissues.
Nature 550: 2017 239-243
Independent impacts of aging on mitochondrial DNA quantity and quality in humans.
BMC Genomics 18: 2017 890
Genome-wide association study of classical Hodgkin lymphoma identifies key regulators of disease susceptibility.
Nat Commun 8: 2017 1892
Analysis of Human Sequence Data Reveals Two Pulses of Archaic Denisovan Admixture.
Cell 173: 2018 53-61.e9
Genome-wide association study identifies susceptibility loci for B-cell childhood acute lymphoblastic leukemia.
Nat Commun 9: 2018 1340
Combined linkage and association analysis of classical Hodgkin lymphoma.
Oncotarget 9: 2018 20377-20385
Low-frequency variation in TP53 has large effects on head circumference and intracranial volume.
Nat Commun 10: 2019 357
GARFIELD classifies disease-relevant genomic features through integration of functional annotations with association signals.
Nat Genet 51: 2019 343-353
Examining the Impact of Imputation Errors on Fine-Mapping Using DNA Methylation QTL as a Model Trait.
Genetics 212: 2019 577-586
An estimator of first coalescent time reveals selection on young variants and large heterogeneity in rare allele ages among human populations.
PLoS Genet 15: 2019 e1008340
Identification of four novel associations for B-cell acute lymphoblastic leukaemia risk.
Nat Commun 10: 2019 5348
Common variation at 16p11.2 is associated with glycosuria in pregnancy: findings from a genome-wide association study in European women.
Hum Mol Genet 29: 2020 2098-2106
Composite trait Mendelian randomization reveals distinct metabolic and lifestyle consequences of differences in body shape.
Commun Biol 4: 2021 1064
Privacy-preserving genotype imputation with fully homomorphic encryption.
Cell Syst 13: 2022 173-182.e3
Improved analyses of GWAS summary statistics by reducing data heterogeneity and errors.
Nat Commun 12: 2021 7117
Simultaneous estimation of bi-directional causal effects and heritable confounding from GWAS summary statistics.
Nat Commun 12: 2021 7274
Quantifying the role of transcript levels in mediating DNA methylation effects on complex traits and diseases.
Nat Commun 13: 2022 7559