EGAD00001001333
Whole exome sequencing BAM files for samples from the BRIDGE Consortium with pathogenic or likely pathogenic variants on genes linked to bleeding or platelet disorders.
Illumina HiSeq 2000
28
EGAD00001002070
Whole genome sequencing CRAM files for four samples from the BRIDGE Consortium (SPEED project) with pathogenic variants in a gene associated with a movement disorder.
Illumina HiSeq 2000
4
EGAD00001002656
Whole exome sequencing BAM files and whole genome sequencing CRAM files for 722 individuals from the NIHR-BioResource Rare Diseases Consortium (SPEED project) with inherited retinal disease.
Illumina HiSeq 2000
707
EGAD00001002730
SPEED - childhood dystonia KMT2B dataset
Illumina HiSeq 2000
5
EGAD00001003423
Pulmonary arterial hypertension (PAH) is a rare disorder with a poor prognosis. Deleterious variation within genes encoding components of the transforming growth factor-ß pathway underlie the majority of heritable forms of PAH. Identifying the missing genetic contribution is challenging, even with genes of large effect size, since it likely involves mutations in genes confined to small numbers of PAH cases. In this study, we performed whole genome sequencing, comparing 1038 PAH index cases to 6385 subjects with other rare diseases. Rare variant analysis identified mutations in novel causal genes, namely ATP13A3, AQP1 and SOX17, and provided independent validation of a critical role for GDF2 in PAH. We detected mutations predicted to be disruptive of function in most, but not all, previously reported PAH genes. Taken together these findings provide new insights into the molecular basis of PAH, and support a central role for endothelial dysregulation in disease pathogenesis.
Illumina HiSeq 2000
149
EGAD00001003809
This dataset includes 186 whole genome sequencing samples which combine to create 93 pairs. Each pair is comprised of two sequencing experiments carried out on the same donor to the NIHR BioResource Rare Disease cohort. These samples have been used to validate the Telomerecat method (a method for estimating telomere length from whole genome sequencing).
Illumina HiSeq 2000
52
EGAD00001004088
Multiple primary tumors (MPT) affect a substantial proportion of cancer survivors and may result from various causes including inherited predisposition. Currently, germline genetic testing of MPT cases for cancer predisposition gene (CPG) variants is mostly targeted by tumor type. We ascertained pre-assessed MPT cases from genetics centers (defined as ≥2 primaries by age 60 years or ≥3 by 70) and performed whole genome sequencing (WGS) on 460 individuals from 440 families. Despite previous negative genetic assessment/molecular investigations, pathogenic variants in moderate and high-risk CPGs were detected in 67/440 (15.2%) of probands. WGS detected variants that would not be (or were not) detected by targeted resequencing strategies including structural variants at low frequency (6/440 (1.4%) of probands). In most individuals with a germline variant assessed as pathogenic or likely pathogenic (P/LP), at least one of their tumor types was characteristic of variants in the relevant CPG. However, in 29 probands (42.2% of those with a P/LP variant) the tumor phenotype appeared discordant. The frequency of individuals with truncating or splice site CPG variants and at least one discordant tumor type was significantly higher than a control population (χ2=43.642 P=<0.0001). 2/67 (3%) of probands with P/LP variants had evidence of multiple inherited neoplasia allele syndrome (MINAS) with deleterious variants in two CPGs. Summing together variant detection rates from a similarly ascertained previous MPT case series, the present results suggest that first-line comprehensive CPG analysis in a clinical genetics referral-based MPT cohort would detect a deleterious variant in about a third of cases.
Illumina HiSeq 2000
81
EGAD00001004357
Whole genome sequencing of sick children in neonatal and paediatric intensive care units. Datasets EGAD00001007780 (GRCh37) and EGAD00001007868 (GRCh38) are extentions of this dataset.
Illumina HiSeq 2000
219
EGAD00001004456
This dataset contains short-read whole-genome sequencing data for individuals with neurodevelopmental disorders and their relatives from the NIHR-BioResource Rare Disease Consortium.
Illumina HiSeq 2000
4
EGAD00001004513
Short read whole genome sequencing (WGS) CRAM files for the NIHR BioResource Rare Diseases WGS project – Participants from the Cerebral Small Vessel Disease (CSVD) Rare Disease domain
Illumina HiSeq 2000
-
EGAD00001004514
Short read whole genome sequencing (WGS) CRAM files for the NIHR BioResource Rare Diseases WGS project – Participants from the Hypertrophic Cardiomyopathy (HCM) Rare Disease domain
Illumina HiSeq 2000
-
EGAD00001004515
Short read whole genome sequencing (WGS) CRAM files for the NIHR BioResource Rare Diseases WGS project.Participants from the Intrahepatic Cholestasis of Pregnancy (ICP) Rare Disease domain
Illumina HiSeq 2000
2
EGAD00001004516
Short read whole genome sequencing (WGS) CRAM files for the NIHR BioResource Rare Diseases WGS project – Participants from the Neuropathic Pain Disorders (NPD) Rare Disease domain
Illumina HiSeq 2000
-
EGAD00001004517
Short read whole genome sequencing (WGS) CRAM files for the NIHR BioResource Rare Diseases WGS project – Participants from the Primary Membranoproliferative Glomerulonephritis (PMG) Rare Disease domain
Illumina HiSeq 2000
-
EGAD00001004518
Short read whole genome sequencing (WGS) CRAM files for the NIHR BioResource Rare Diseases WGS project – Participants from the Steroid Resistant Nephrotic Syndrome (SRNS) Rare Disease domain
Illumina HiSeq 2000
-
EGAD00001004519
Short read whole genome sequencing (WGS) CRAM files for the NIHR BioResource Rare Diseases WGS project – Participants from the Bleeding, Thrombotic and Platelet Disorders (BPD) Rare Disease domain
Illumina HiSeq 2000
1
EGAD00001004520
Short read whole genome sequencing (WGS) CRAM files for the NIHR BioResource Rare Diseases WGS project – Participants from the Inherited Retinal Disorders (IRD) Rare Disease domain
Illumina HiSeq 2000
-
EGAD00001004521
Short read whole genome sequencing (WGS) CRAM files for the NIHR BioResource Rare Diseases WGS project – Participants from the Multiple Primary Malignant Tumours (MPMT) Rare Disease domain
Illumina HiSeq 2000
-
EGAD00001004522
Short read whole genome sequencing (WGS) CRAM files for the NIHR BioResource Rare Diseases WGS project – Participants from the Neurological and Developmental Disorders (NDD) Rare Disease domain
Illumina HiSeq 2000
-
EGAD00001004523
Short read whole genome sequencing (WGS) CRAM files for the NIHR BioResource Rare Diseases WGS project.Participants from the Primary Immune Disorders (PID) Rare Disease domain
Illumina HiSeq 2000
-
EGAD00001004524
Short read whole genome sequencing (WGS) CRAM files for the NIHR BioResource Rare Diseases WGS project – Participants from the Stem cell and Myeloid Disorders (SMD) Rare Disease domain
Illumina HiSeq 2000
-
EGAD00001004525
Short read whole genome sequencing (WGS) CRAM files for the NIHR BioResource Rare Diseases WGS project – Participants from the Pulmonary Arterial Hypertension (PAH) Rare Disease domain
Illumina HiSeq 2000
-
EGAD00001005023
The COMPARE study enrolled 29,066 British blood between donors between February 2016 and March 2017, the study aim is to find the optimum technology for haemoglobin screening (ISRCTN 90871183). All participants were at the time of recruitment active blood donors. The 4,796 participants in this dataset have consented to join the NIHR BioResource. Genotyping data was produced using the Thermo Fisher Scientific Axiom Genotyping platform. The UK Biobank version 2 array design was used, content on this array has been added to allow for accurate DNA based identification of human blood group antigens.
-
EGAD00001005026
The Donor InSight III study, undertaken by Sanquin research, recruited 3,046 Dutch blood donors between 2015 and 2016. The purpose of the study was to gain more insight into characteristics of donors, their motivations and health. All participants were at the time of recruitment active blood donors. Genotyping data was produced using the Thermo Fisher Scientific Axiom Genotyping platform. The UK Biobank version 2 array design was used, content on this array has been added to allow for accurate DNA based identification of human blood group antigens.
-
EGAD00001005107
To identify novel causes of hereditary thrombocytopenia, we performed a genetic
association analysis of whole-genome sequencing (WGS) data from 13,037 individuals
enrolled in the NIHR BioResource, including 233 cases with isolated thrombocytopenia.
We found an association between rare variants in the transcription factor (TF)-encoding
gene IKZF5 and thrombocytopenia. We report five causal missense variants in or near
IKZF5 zinc fingers (Znfs), of which two occurred de novo and three co-segregated in three
pedigrees. A canonical DNA-Znf binding model predicts that three of the variants alter
DNA recognition. Expression studies showed that chromatin binding was disrupted in
mutant compared to wild-type (WT) IKZF5 and electron microscopy (EM) revealed a
reduced quantity of alpha granules in normally sized platelets. Proplatelet formation (PPF)
was reduced in megakaryocytes (MKs) from seven cases relative to six controls.
Comparison of RNA-seq data from platelets, monocytes, neutrophils and CD4+ T-cells
from three cases and 14 healthy controls showed 1,194 differentially expressed genes
(DEGs) in platelets but only four DEGs in each of the other blood cell types. In conclusion,
IKZF5 is a novel transcriptional regulator of megakaryopoiesis and the eighth transcription
factor associated with dominant thrombocytopenia in humans.
Illumina HiSeq 4000
51
EGAD00001005122
Short read whole genome sequencing (WGS) CRAM files for the NIHR BioResource Rare Diseases WGS project – Participants from the Leber Hereditary Optic Neuropathy (LHON) Rare Disease domain
Illumina HiSeq 2000
-
EGAD00001005123
Short read whole genome sequencing (WGS) CRAM files for the NIHR BioResource Rare Diseases WGS project – Participants from the Ehler-Danlos (ED) and ED-like Syndromes (EDS) Rare Disease domain.
Illumina HiSeq 2000
-
EGAD00001005950
Gray Platelet Syndrome (GPS) is a rare recessive bleeding disorder resulting from biallelic variants in NBEAL2. As part of a comprehensive evaluation of the phenotype and genotype in 47 patients with GPS, four different blood cell-types (platelets, neutrophils, monocytes, and CD4-lymphocytes) were evaluated using bulk RNA-seq in five patients and five controls. These data are deposited in this archive in FASTQ format.
Illumina HiSeq 4000
40
EGAD00001006065
Most patients with rare diseases do not receive a molecular diagnosis and the aetiological
variants and mediating genes for more than half such disorders remain to be discovered. We
implemented whole-genome sequencing (WGS) in a national healthcare system to streamline
diagnosis and to discover unknown aetiological variants, in the coding and non-coding regions
of the genome. In a pilot study for the 100,000 Genomes Project, we generated WGS data for
13,037 participants, of whom 9,802 had a rare disease, and provided a genetic diagnosis to
1,138 of the 7,065 patients with detailed phenotypic data. We identified 95 Mendelian
associations between genes and rare diseases, of which 11 have been discovered since 2015
and at least 79 are confirmed aetiological. Using WGS of UK Biobank1, we showed that rare
alleles can explain the presence of some individuals in the tails of a quantitative red blood cell
(RBC) trait. Finally, we reported 4 novel non-coding variants which cause disease through the
disruption of transcription of ARPC1B, GATA1, LRBA and MPL. Our study demonstrates a
synergy by using WGS for diagnosis and aetiological discovery in routine healthcare.
Illumina HiSeq 4000
1
EGAD00001007776
This dataset contains whole blood transcriptome data generated from 93 patients with COVID-19 across a range of severities and 23 healthy controls. All patients were PCR positive for SARS-CoV-2 and disease severity ranged from asymptomatic to severe disease requiring ventilation. Individuals without symptoms, or with mild symptoms, were recruited from routine screening of healthcare workers, while COVID-19 patients were recruited at or soon after admission to Addenbrooke’s or Royal Papworth hospitals. Blood samples were taken at recruitment and then again four weeks later. Further details of the cohort and the generation of the RNA-Sequencing data can be obtained from Bergamaschi, L. et al. Longitudinal analysis reveals that delayed bystander CD8+ T cell activation and early immune pathology distinguish severe COVID-19 from mild disease. Immunity 54, 1257-1275 e8 (2021).
Illumina HiSeq 4000
768
EGAD00001007777
This dataset contains multiplexed fastq files containing raw BCR repertoire data
Illumina MiSeq
11
EGAD00001007780
Whole genome sequencing of sick children in neonatal and paediatric intensive care units, aligned to reference assembly GRCh37.
Illumina HiSeq 2000
-
EGAD00001007868
Whole genome sequencing of sick children in neonatal and paediatric intensive care units, aligned to reference assembly GRCh38.
Illumina HiSeq 2000
449
EGAD00001007885
Short read whole genome sequencing (WGS) VCF files for the NIHR BioResource Rare Diseases WGS project – Participants from the Hypertrophic Cardiomyopathy (HCM) Rare Disease domain
-
EGAD00001008368
None
Illumina MiSeq
56
EGAD00010002059
NIHR BioResource Common Disease Patients 2016. The dataset includes 13489 samples from blood donors, they were not screened for any particular disease, and therefore they are representative of the general population. Genomic data includes 845487 snps collected using the UK BioBank V1 Affymetrix array. Phenotypic data includes gender, age, ethnicity and disease. According to our internal quality check there are 81 duplicates in this dataset.
Genotyped using UK Biobank Axiom Array (Applied Biosystems/Thermofisher), read on GeneTitan Multi Channel System (Affymetrix/ThermoFisher) and analysed with the Axiom Analysis Suite (Applied Biosystems/Thermofisher)
13490