The NHGRI GREGoR (Genomics Research to Elucidate the Genetics of Rare Disease) Consortium was established in June 2021 with the goal of developing novel tools and approaches to advance the discovery of the genetic basis of rare conditions. Numerous types of molecular data are generated in GREGoR and available on the AnVIL cloud platform via dbGaP application, including short- and long-read genome and exome sequencing, transcriptomics, metabolomics, methylomics, and proteomics. De-identified clinical and demographic data is obtained, with a focus on standardized ontologies.Visit the GREGoR Consortium data webpage for summary information about the GREGoR Dataset, including numbers of participants and data types, methods documentation, and Release Notes. The Consortium comprises five Research Centers (RCs - see below), a Data Coordinating Center (DCC), and various partner members and external collaborators.Baylor College of Medicine Research Center (BCM-GREGoR) The Baylor College of Medicine GREGoR program, which is part of the GREGoR consortium, enrolls individuals, families, and cohorts with suspected rare disease across a range of syndromic and non-syndromic phenotypes. Subjects are enrolled from national and international collaborating physician referrals. Subjects provide written informed consent for future re-contact. Data generated and shared include family structure, detailed phenotypes, exome or short-read genome data, and in some cases long-read genome or RNA-sequencing, and these are shared upon completion of standard quality control checks and annotation. Broad Institute (Broad) The Broad Center for Mendelian Genomics, part of the GREGoR consortium uses next-generation sequencing (exome, genome, transcriptome, and long read sequencing), computational approaches, and functional studies to discover the variants and genes that underlie Mendelian conditions with a particularly focus on neuromuscular, neurodevelopmental, and syndromic phenotypes. Samples come from collaborators and direct enrollment through the Rare Genomes Project and we are committed to rapid data sharing without an embargo period. University of California, Irvine (UCI-GREGoR) To accelerate the pace of Mendelian disease gene discovery and clinical implementation, we propose a Mendelian Genomics Research Center, part of the GREGoR Consortium, leveraging the broad pediatric and adult clinical and research expertise at Children's National Hospital and University of California, Irvine. Our goal is to develop best practices to increase the diagnostic yield of rare diseases, engage the community to reduce health disparities for complex diagnoses, while creating a dataset accessible to all. Our Center will unite world class experts combining basic and translational research with innovative approaches to phenotyping, variant identification and functional investigation of both coding and non-coding sequence changes with the goals of discovering novel Mendelian gene variations and identifying variants not detected on current sequencing pipelines, disambiguating uncertain variants into disease-causing versus benign categorizations, and sharing information by working collaboratively with the GREGoR community.GREGoR Stanford Site (GSS) The goal of the GREGoR Stanford Site (GSS) is to provide a platform for functional genomics research and validation to improve diagnosis in Mendelian disease. Participants included individuals with undiagnosed suspected Mendelian disease who had non-diagnostic exome sequencing and their immediate family members. Participants and their family members provided written, informed consent and biological samples from which DNA, RNA, plasma, fibroblasts, PBMCs and other cell types were generated and stored. Samples from research participants and their immediate family members may have undergone short and long-read genome sequencing, transcriptome sequencing, metabolomics and/or lipidomics profiling, methyl-capture-sequencing and ATAC-sequencing. De-identified clinical data extracted from participant medical records are linked to the samples. University of Washington Center for Rare Disease Research (UW-CRDR) The goals of the University of Washington Center for Rare Disease Research are to: (1) maximize novel gene discovery for Mendelian conditions by recruitment, short- and long-read whole genome sequencing, transcriptome sequencing and analysis of families with rare conditions for which the gene is either unknown or the gene is known but no pathogenic variant can be identified via clinical testing; (2) develop new strategies for gene discovery for Mendelian conditions caused by variants that are difficult to detect using conventional testing strategies, variants of unknown function effect (e.g., regulatory, structural variants) or have unusual modes of inheritance; and (3) implement high-throughput screening and targeted follow-up functional studies to prioritize and validate candidate non-coding variants. De-identified data and phenotypic information are shared via MyGene2, ClinVar, and AnVIL.
Exome sequencing for 26 patients with matched blood RNA-seq for 41 patients
WES: 48 samples (5 blood samples from 6 patient data): 22 Tumour cores and 26 normal/benign cores (HiSeq)
RNAseq was performed on CDX, CDX-derived cell line and LNCaP cell line, with triplicates.
Intellectual Disability (ID) is one of the most common global disorders. However for ~30% of patients with ID no cause is identified, despite a clinical diagnostic odyssey that includes genome wide clinical microarray (CMA). We carefully selected 8 trios, each composed of a child with ID and brain structural defect, and both normal parents for whole genome sequencing (WGS). We conducted WGS on the trio and filtered out de novo single nucleotide variants (SNVs) and then used a pathway-based refinement to select candidates. We confirmed a de novo pathogenic SNV in ARID1B, and likely pathogenic SNVs in CACNB3, SPRY4, PHF6, SQSTM1 and UPF1 in 5 of the 8 children. All genes except ARID1B and PHF6 are previously unreported for ID. We analysed our WGS data using 4 independent algorthims for copy number and structural variation including using de novo whole genome assembly. We confirmed one likely contributory 165kb de novo CNV (missed by CMA). We conducted a validation study of over 2000 exomes for our novel candidate genes and also present a strategy to analyze the non-coding sequence space. This study provides an extensive analysis of WGS in the context of ID and yielded causative variants in >60% of cases.
The study examined WES of der(1;7)(q10;p10) myeloid neoplasm cases. BAM files of WES of 26 myeloid neoplasm patients with der(1;7)(q10;p10) were used to identify key driver genes in patients with der(1;7)(q10;p10). This study is one of the largest WES for der(1;7)(q10;p10)(+) myeloid neoplasm cases.
The goal of this project was to perform long-read RNA sequencing (LR-seq, PacBio) in combination with short-read RNA-seq for systematic characterization of the isoform diversity in primary breast tumor samples. We sequenced the full-length transcriptomes of 26 breast tumors and 4 normal breast samples.
exome sequencing files from 25 alopecia areata samples from spain.