CADD/GADD centers on Antisocial Drug Dependence
CADD (Center for Antisocial Drug Dependence): Funded through NIDA 011015 to study genetic influences on, and treatment of, antisocial drug dependence, studying both clinical probands and their families, and community samples of matched controls, twins, and participants in an ongoing longitudinal adoption study. A collaboration between three organizations at two campuses of the University of Colorado. Longitudinal with three waves of data collection completed. GADD (Genetics of Adolescent Antisocial Drug Dependence): Funded originally through NIDA 012845, s multisite collaboration including adolescent subjects at high-risk for antisocial drug dependence and their siblings, recruited in Denver, CO and San Diego, CA. Longitudinal with two waves of data collection completed, one in progress as of May, 2018.
Study
phs001841
Extreme phenotypes define epigenetic and metabolic signatures in cardiovascular diseases.
Improving the understanding of cardiometabolic syndrome pathophysiology and its
relationship with thrombosis are ongoing healthcare challenges. Using plasma biomarkers
analysis coupled with the transcriptional and epigenetic characterisation of cell types
involved in thrombosis, obtained from two extreme phenotype groups (obese and
lipodystrophy) and comparing these to lean individuals and blood donors, the present study
identifies the molecular mechanisms at play, highlighting patterns of abnormal activation in
innate immune phagocytic cells and shows that extreme phenotype groups could be
distinguished from lean individuals, and from each other, across all data layers. The
characterisation of the same obese group, six months after bariatric surgery shows the loss of
the patterns of abnormal activation of innate immune cells previously observed. However,
rather than reverting to the gene expression landscape of lean individuals, this occurs via the
establishment of novel gene expression landscapes. Netosis and its control mechanisms
emerge amongst the pathways that show an improvement after surgical intervention. Taken
together, by integrating across data layers, the observed molecular and metabolic differences
form a disease signature that is able to discriminate, amongst the blood donors, those
individuals with a higher likelihood of having cardiometabolic syndrome, even when not
presenting with the classic features.
Study
EGAS00001003780
MARS-seq dataset of five obese human subjects and a lean human subject
Biopsies from visceral adipose tissue from the omental depot (OAT) were obtained from five obese individuals and one lean donor with participant informed consent obtained after the nature and possible consequences of the studies were explained under protocols approved by the Institutional Review Boards of the Perelman School of Medicine at the University of Pennsylvania, the Children’s Hospital of Philadelphia, or the Tel Aviv Sourasky Medical Center. The obese donors underwent bariatric surgery, the lean donor underwent cholecystectomy. OAT samples were placed in 1 mL of DMEM, and finely minced under sterile conditions before digestion in 50 mL of DMEM with 3 mg/1 mL collagenase IV (Gibco). Samples were incubated at 37°C in a rotating oven for 20-60 min. Adipocyte and stromal vascular fractions (SVF) were separated by centrifugation, and red blood cells (RBCs) were removed from the SVF by histopaque gradient (Sigma). Single-cell RNA-sequencing libraries were prepared using the MARS-seq pipeline, and sequenced on the MiSeq 500 or HiSeq 2500 Sequencing System (Illumina).
Dataset
EGAD00001005100
Identification of putative multiple myeloma (MM) susceptibility genes
We sought to identify novel MM susceptibility genes using a collection of families with multiple cases of MM/MGUS, including 189 affected individuals from 40 families, and index cases from an additional 88 families, along with 170 early-onset (EO) MM cases (≤ 55 years). We analyzed a total of 347 affected individuals using whole exome (N=321) and whole genome (N=26) sequencing. Samples were identified and collected through nation-wide efforts in France, Sweden and Greece. We focused on rare (MAF<0.5%) germline protein truncating and likely deleterious missense variants in genes harboring variants in at least two families showing variant-disease segregation, and in additional index (≥2) and/or early-onset (≥2) cases.
Study
EGAS50000001259
300-Obese: clinical cohort of obese individuals, Nijmegen, the Netherlands
300-Obese cohort was recruited at the Radboud University Medical Center (RUMC), Nijmegen, the Netherlands. The cohort comprises 377 participants included by the following criteria: age>55 years, BMI>27 kg/m2. The cohort data includes gut microbiome, NMR serum metabolomics, deep cardiovascular phenotyping and broad range of phenotypic information.
Study
EGAS00001003508
TMM whole genome analysis of 4566 Japanese individuals
Tohoku University Tohoku Medical Megabank Organization (ToMMo) and Iwate Tohoku Medical Megabank Organization (IMM) were founded to establish an advanced medical system to foster the reconstruction from the Great East Japan Earthquake. These organizations are developing a biobank that includes medical and genome information for supporting health and welfare in the Tohoku area. In the first stage, the part of our mission was to sequence the 4,000 individuals to construct Japanese whole-genome reference panel.
Study
JGAS000239
The Thrifty Microbiome: The Role of the Gut Microbiota in Obesity in the Amish
Emerging evidence that the gut microbiota may contribute in important ways to human health and disease has led us and others to hypothesize that both symbiotic and pathological relationships between gut microbes and their host may be key contributors to obesity and the metabolic complications of obesity. Our "Thrifty Microbiome Hypothesis" poses that gut microbiota play a key role in human energy homeostasis. Specifically, constituents of the gut microbial community may introduce a survival advantage to its host in times of nutrient scarcity, promoting positive energy balance by increasing efficiency of nutrient absorption and improving metabolic efficiency and energy storage. However, in the presence of excess nutrients, fat accretion and obesity may result, and in genetically predisposed individuals, increased fat mass may result in preferential abdominal obesity, ectopic fat deposition (liver, muscle), and metabolic complications of obesity (insulin resistance, hypertension, hyperlipidemia). Furthermore, in the presence of excess nutrients, a pathological transition of the gut microbial community may occur, causing leakage of bacterial products into the intestinal lymphatics and portal circulation, thereby inducing an inflammatory state, further aggravating metabolic syndrome traits and accelerating atherosclerosis. This pathological transition and the extent to which antimicrobial leakage occurs and causes inflammatory and other maladaptive sequelae of obesity may also be influenced by host factors, including genetics. In the proposed study, we will directly test the Thrifty Microbiome Hypothesis by performing detailed genomic and functional assessment of gut microbial communities in intensively phenotyped and genotyped human subjects before and after intentional manipulation of the gut microbiome. To address these hypotheses, five specific aims are proposed: (1) enroll three age- and sex-matched groups from the Old Order Amish: (i) 50 obese subjects (BMI > 30 kg/m2) with metabolic syndrome, (ii) 50 obese subjects (BMI > 30 kg/m2) without metabolic syndrome, and (iii) 50 non-obese subjects (BMI < 25 kg/m2) without metabolic syndrome and characterize the architecture of the gut microbiota from the subjects enrolled in this study by high-throughput sequencing of 16S rRNA genes; (2) characterize the gene content (metagenome) to assess the metabolic potential of the gut microbiota in 75 subjects to determine whether particular genes or pathways are correlated with disease phenotype; (3) characterize the transcriptome in 75 subjects to determine whether differences in gene expression in the gut microbiota are correlated with disease phenotype, (4) determine the effect of manipulation of the gut microbiota with antibiotics on energy homeostasis, inflammation markers, and metabolic syndrome traits in 50 obese subjects with metabolic syndrome and (5) study the relationship between gut microbiota and metabolic and cardiovascular disease traits, weight change, and host genomics in 1,000 Amish already characterized for these traits and in whom 500K Affymetrix SNP chips have already been completed. These studies will provide our deepest understanding to date of the role of gut microbes in terms of 'who's there?', 'what are they doing?', and 'how are they influencing host energy homeostasis, obesity and its metabolic complications? PUBLIC HEALTH RELEVANCE: This study aims to unravel the contribution of the bacteria that normally inhabit the human gastrointestinal tract to the development of obesity, and its more severe metabolic consequences including cardiovascular disease, insulin resistance and Type II diabetes. We will take a multidisciplinary approach to study changes in the structure and function of gut microbial communities in three sets of Old Order Amish patients from Lancaster, Pennsylvania: obese patients, obese patients with metabolic syndrome and non-obese individuals. The Old Order Amish are a genetically closed homogeneous Caucasian population of Central European ancestry ideal for genetic studies. These works have the potential to provide new mechanistic insights into the role of gut microflora in obesity and metabolic syndrome, a disease that is responsible for significant morbidity in the adult population, and may ultimately lead to novel approaches for prevention and treatment of this disorder.
Study
phs000258
MOSAIC - Multi-Omics Spatial Atlas In Cancer
MOSAIC is a collaborative initiative founded by Owkin, Lausanne University Hospital (CHUV), Charité Universitätsmedizin Berlin, University Hospital Erlangen (UKER), Gustave Roussy Institute in Paris, and University of Pittsburgh. The goal of MOSAIC is to build the largest collection of spatial omics data in cancer. By integrating comprehensive high quality clinical annotations with advanced deep profiling techniques, MOSAIC aims to uncover novel cancer subtypes and identify key drug targets and biomarkers within them.
Study
EGAS50000000689
Extreme phenotypes define epigenetic and metabolic signatures in cardiovascular diseases
Improving the understanding of cardiometabolic syndrome pathophysiology and its relationship with thrombosis are ongoing healthcare challenges. Using plasma biomarkers analysis coupled with the transcriptional and epigenetic characterisation of cell types involved in thrombosis, obtained from two extreme phenotype groups (obese and lipodystrophy) and comparing these to lean individuals and blood donors, the present study identifies the molecular mechanisms at play, highlighting patterns of abnormal activation in innate immune phagocytic cells and shows that extreme phenotype groups could be distinguished from lean individuals, and from each other, across all data layers. The characterisation of the same obese group, six months after bariatric surgery shows the loss of the patterns of abnormal activation of innate immune cells previously observed. However, rather than reverting to the gene expression landscape of lean individuals, this occurs via the establishment of novel gene expression landscapes. Netosis and its control mechanisms emerge amongst the pathways that show an improvement after surgical intervention. Taken together, by integrating across data layers, the observed molecular and metabolic differences form a disease signature that is able to discriminate, amongst the blood donors, those individuals with a higher likelihood of having cardiometabolic syndrome, even when not presenting with the classic features.
Dataset
EGAD00001005197
Human liver NPCs single cell project
Independent of their inflammatory phenotype, macrophages are key orchestrators of hepatic metabolism. Non-alcoholic fatty liver disease (NAFLD) often occurs in obese individuals and is among the most common causes of cirrhosis, the terminal chronic liver disease that may necessitate liver transplantation. While multiple populations of macrophages have been described in the human liver, their function and turnover in obese patients at high risk of developing NAFLD and cirrhosis is currently unknown. Herein we identified a specific human population of resident liver myeloid cells that protects against the metabolic impairment associated with obesity. By studying the turnover of liver myeloid cells in individuals undergoing liver transplantation using markers of donor-recipient mismatch, we made the novel discovery that liver myeloid cell turnover differs between humans and mice. Using single cell techniques and flow cytometry we determined that the proportion of the protective resident liver myeloid cells, denoted liver myeloid cells 2 (LM2), decreases during obesity. Functional validation approaches using human 2D and 3D cultures revealed that the presence of LM2 ameliorates the oxidative stress associated with obese conditions. Our study indicates that resident myeloid cells could be a therapeutic target to decrease the oxidative stress associated with NAFLD.
Study
EGAS00001007194
This dataset contains fastq and BAM data from female adipose tissue.
Here we have from 64 samples, their corresponding fastq and bam files.
The study group consisted of 17 obese women with normal glucose tolerance and 15 obese women with T2DM classified according to WHO standards. The groups were matched for age, BMI and waist circumference. All the women had been morbidly obese (BMI>40 kg/m2) for at least five years.
Dataset
EGAD00001002202
Phylogenetic Analyses of Melanoma Reveal Complex Patterns of Metastatic Dissemination
Subpopulations of cells in a primary melanoma can disseminate and establish metastases. Still, the precise ancestral relationship between primary tumors and their metastases is not well understood. Using whole-exome sequencing (for discovery) and targeted sequencing (for validation), we analyzed mutation patterns of primary melanomas and two or more metastases in each of 8 patients to determine their phylogenetic relationships, profiling a total of 31 total tumors. The resulting data show that in 6 of 8 cases, genetically unique cell populations in the primary metastasized in parallel to distinct anatomic sites, rather than sequentially. These data also indicate that individual metastases were sometimes founded by multiple cell populations of the primary that were genetically distinct.
Study
phs000941
Evaluation of Nuclear DNA from Rootless Hairs for Forensic Purposes
The study aims to overcome current limitations in the recovery of DNA from small, difficult forensic samples. Particularly, our goal is to produce a robust laboratory protocol and the accompanying software to accelerate adoption, as well as to evaluate the reliability and robustness of both the laboratory and computational aspects of generating genotype files from minute and/or degraded DNA samples, such as single, rootless hairs.The data accompanying this study includes raw, paired-end reads from high-throughput sequencing of two panels of saliva, head hair, and pubic hair samples collected from anonymous volunteers at the University of California, Santa Cruz. The smaller set (Hair1.0) comprises 8 individuals, while the larger (Hair2.0) comprises 50 individuals, with 3 overlapping individuals between the two panels identified in post-collection analysis. We did not collect phenotype data or personally identifying information from the participants. For the Hair2.0 panel, only a subset of volunteers provided pubic hairs for DNA extraction and sequencing. Also included are saliva-derived genotype array data for all 8 individuals in the Hair1.0 panel and 44 of 50 individuals in the Hair2.0 panel.
Study
phs002979
Paired exome analysis in urothelial carcinoma
In this study we characterized genomic alterations in two to five metachronous tumors from 29 patients initially diagnosed with early stage disease. Fourteen patients (32 tumors) had non progressive disease (NPD) and 15 patients (34 tumors) had progressive disease (PD). Whole exome sequencing (WES, ~50x mean read depth), Ultra deep targeted sequencing (~6,809x mean read depth) and whole transcriptome RNA-seq was performed for all samples. In addition multiregional WES was performed on 8 adjacent regions from a single tumor.
Study
EGAS00001001686
NHLBI GO-ESP: Early-Onset Myocardial Infarction (Broad EOMI)
The NHLBI "Grand Opportunity" Exome Sequencing Project (GO-ESP), a signature project of the NHLBI Recovery Act investment, was designed to identify genetic variants in coding regions (exons) of the human genome (the "exome") that are associated with heart, lung and blood diseases. These and related diseases that are of high impact to public health and individuals from diverse racial and ethnic groups will be studied. These data may help researchers understand the causes of disease, contributing to better ways to prevent, diagnose, and treat diseases, as well as determine whether to tailor prevention and treatments to specific populations. This could lead to more effective treatments and reduce the likelihood of side effects. GO-ESP is comprised of five collaborative components: 3 cohort consortia - HeartGO, LungGO, and WHISP - and 2 sequencing centers - BroadGO and SeattleGO. In the Grand Opportunities Exome Sequencing Program Early MI Project (GO ESP - EOMI), we are sequencing cases with extremely early-onset MI drawn from 8 cohorts. These cohorts include five hospital or community-based studies that ascertained individuals based on MI status. These include PennCATH, Cleveland Clinic Genebank, Massachusetts General Hospital Premature Coronary Artery Disease Study (MGH-PCAD), Heart Attack Risk in Puget Sound (HARPS), and Translational Research Investigating Underlying Disparities in Myocardial Infarction Patients' Health Status (TRIUMPH). Cases were selected based on MI occurring in men aged ≤50 years and women aged ≤60 years. In addition, early-MI cases are being drawn from three population-cohort studies including the Framingham Heart Study, the Women's Health Initiative, and the Atherosclerosis Risk in Communities Study. MI-free controls are being drawn from five population-based cohort studies including the Framingham Heart Study, the Women's Health Initiative, Atherosclerosis Risk in Communities Study, Cardiovascular Health Study, and the Jackson Heart Study. Controls were selected based on two factors: (1) highest predicted risk for MI based on Framingham risk score; and (2) absence of prevalent or incident MI despite a high predicted risk.
Study
phs000279
OCD Collaborative Genetic Association Study (OCGAS)
The OCD Collaborative Genetics Association Study Group (OCGAS) was funded by NIMH to conduct a genome-wide association study to identify disease susceptibility loci of early-onset obsessive-compulsive disorder (OCD). Collaborators at Johns Hopkins, Brown, Columbia, MGH, UCLA, and NIMH evaluated 2,000 individuals with OCD and collected DNA from these individuals and both their parents. The genotyping and analyses was performed in two stages. In the first stage 1,065 families (comprising 1406 patients with OCD and 2895 individuals in total) were genotyped on the Illumina OmniExpress GWA (SNPs) panel at the JHU SNP Center.
Study
phs000903
Ecological Stressors, PTSD, and Drug Use in Detroit: The Detroit Neighborhood Health Study (DNHS)
The Detroit Neighborhood Health Study (DNHS) is a prospective, representative longitudinal cohort study of predominantly African American adults living in Detroit, Michigan. The overall goal of the DNHS is to identify how genetic variation, lifetime experience of stressful and traumatic events, and features of the neighborhood environment predict psychopathology and behavior. Cohort participants were selected with a dual-frame probability design, using telephone numbers obtained from the U.S. Postal Service Delivery Sequence Files as well as a listed-assisted random-digit-dial frame. Individuals without listed landlines or telephones and individuals with only a cell phone listed were invited to participate through a postal mail effort. Participants completed a 40 minute, structured telephone interview annually between 2008-2012 to assess perceptions of participants' neighborhoods, mental and physical health status, social support, exposure to traumatic events, and alcohol and tobacco use; each participant was compensated $25USD. All survey participants were offered the opportunity to provide a specimen (venipuncture, blood spot, or saliva) for immune and inflammatory marker testing as well as genetic testing of DNA. Participants received an additional $25USD if they elected to give a sample. Informed consent was obtained at the beginning of each interview and again at specimen collection. The Institutional Review Board of the University of Michigan reviewed and approved the study protocol. The DNHS submission to dbGaP includes phenotype data from all five survey waves (n=856), all available GWAS data for participants who completed wave 4 (n=507), and methylation data for wave 1, wave 2, wave 4, and wave 5 participants (n = 456).
Study
phs000560
National Institute of Neurological Disorders and Stroke (NINDS), Family Study of Essential Tremor (FASET), Identification of Susceptibility Genes for Essential Tremor
The Familial Study of Essential Tremor (FASET) was designed to identify susceptibility genes for Essential Tremor. ET is among the most common neurological diseases with a prevalence (age > 40 years) estimated to be 4.0% and prevalence in advanced age (> 90 years) exceeding 20%. ET, often referred to as "familial tremor", is generally regarded as a highly genetic disorder with families, with affected members over multiple generations, and twin studies show high concordance among monozygotic twins. Probands (affected with ET) and relatives were enrolled in a family study of ET at Columbia University, New York between 2011 - 2014. The study was approved by the Institutional Review Board at Columbia University and written informed consent was obtained from all enrollees. The criteria for enrollment were: 1) the proband had early-onset ET with age at onset < 50 years, 2) the proband had a diagnosis of definite or probable ET, 3) in addition to the proband, there were at least two affected relatives in the family, 4) additional affected and unaffected family members were willing to participate in the study, and 5) the families contained more than two affected individuals in different generations. Blood samples were also collected for genetic research. For the genetic analyses, we excluded enrollees that we or others had diagnosed with Parkinson's disease (PD) or dystonia. The final sample includes 52 families (52 probands [affected with ET]) and 155 relatives). The number of affected individuals enrolled per family ranged from 3 - 7 (mean = 4.1). Genetic samples from FASET were analyzed with whole genome SNP genotyping (for linkage analyses) and whole exome sequencing. It is hoped that this resource will better help researchers to understand the genetic causes of ET and underlying disease pathogenesis.
Study
phs000966
Esophageal Adenocarcinoma Organoid Genomics
Single cell RNA-seq (scRNA-seq) of esophageal adenocarcinoma organoids to benchmark variant calling from 10X Genomics scRNA-seq data. Five EAC organoids were subjected to scRNA-seq and two included matched exome data.
Study
EGAS00001005224
RUNX1 mutated families show phenotype heterogeneity and a somatic mutation profile unique to germline predisposed AML
We present the clinical phenotypes and genetic mutations detected in 10 novel RUNX1 mutated FPD-MM families. Genomic analyses on these families detected two partial gene deletions, three novel mutations and five recurrent mutations as the germline RUNX1 alterations leading to FPD-MM. On 15 individuals, across the 10 families, we performed additional whole exome or myeloid panel sequencing on blood or bone marrow to determine somatic mutations that co-exist with the germline RUNX1 mutation in tumour and pre-leukaemic states.
Study
EGAS00001004273
Clinical exome profiling of 7 members of a family with cases of familial Alzheimer's disease
This dataset includes "clinical exome" profiling (approximately 4000 genes related to diseases) on individuals (n=7) from a family with a familial history of Alzheimer's disease. Two affected cases ad five cases without dementia are included.
Dataset
EGAD00001005320
Kibbutzim Family study
The Kibbutzim Family Study (KFS) was established in 1992 to investigate the environmental and genetic determinants of cardiometabolic risk factors and their change over time. The participants belong to large families living in close-knit communities, called “Kibbutzim”, in Northern Israel. The Kibbutz has been a communal settlement, which has created a relatively homogeneous environment for its members. Kibbutz members are mostly of Ashkenazi Jewish ancestry, with the remaining members belonging to other Jewish subgroups. Participants were recruited in two phases from six Kibbutzim. In the first recruitment phase of the study (1992–1993) 500 individuals from 80 extended families (range 2 to 43) were examined. During the second phase (1999–2000), all participants from the first phase were invited for repeat examinations (80% response rate) and additional new participants were recruited, giving a total of 922 individuals from 150 extended families (range 2 to 55). Families were invited to participate if they consisted of at least four individuals who (i) lived in the Kibbutz, (ii) spanned at least two generations, and (iii) were at least 15 years old. Families were retained if at least two family members consented to participate in the study. Overall, 1033 participants were recruited; 111 were examined only in the first phase, 533 only in the second phase, and 389 were included in both. 901 individuals were successfully genotyped using Illumina HumanCoreExome BeadChip.
Study
EGAS00001002782
UK10K_NEURO_ASD_FI
In the UK10K project we propose a series of complementary genetic approaches to find new low frequency/rare variants contributing to disease phenotypes. These will be based on obtaining the genome wide sequence of 4000 samples from the TwinsUK and ALSPAC cohorts (at 6x sequence coverage), and the exome sequence (protein coding regions and related conserved sequence) of 6000 samples selected for extreme phenotypes. Our studies will focus primarily on cardiovascular-related quantitative traits, obesity and related metabolic traits, neurodevelopmental disorders and a limited number of extreme clinical phenotypes that will provide proof-of-concept for future familial trait sequencing. We will analyse directly quantitative traits in the cohorts and the selected traits in the extreme samples, and also use imputation down to 0.1% allele frequency to extend the analyses to further sample sets with genome wide genotype data. In each case we will investigate indels and larger structural variants as well as SNPs, and use statistical methods that combine rare variants in a locus or pathway as well as single-variant approaches.These samples are a subset of a nationwide collection of Finnish autism spectrum disorder (ASD) samples. The samples have been collected from Central Hospitals across Finland in collaboration with the University of Helsinki. The samples consist of 93 individuals with a diagnosis of autistic disorder or Asperger syndrome from 36 families with at least two affected individuals. Of these individuals, 16 can be genealogically connected to form two large pedigrees originating from Central Finland, suggesting possible genetic risk factors shared identical by descent within the pedigrees. All diagnoses are based on ICD-10 and DSM-IV diagnostic criteria for ASDs. Additional phenotypic data is available for a subset of the individuals.For further information with regard to this cohort please contact Aarno Palotie (Aarno.palotie@helsinki.fi).
Study
EGAS00001000110
Whole-genome sequencing of bladder cancers of various stages and grades to search for driver mutations, chromosome-scale somatic changes, mutation signatures and clonal structures.
This discovery set of tumours cancers with whole-genome sequence data comprised 14 bladder cancers, paired with peripheral blood, that had been collected from unrelated individuals presenting to the Urology Department, Royal Hallamshire Hospital, Sheffield between June 2008 and September 2011. Four cancers were of low-grade papillary morphology (pTaG1-2), five were high grade invading the lamina propria (pT1G3, with two subsequently becoming muscle-invasive); and five were muscle-invasive (pT2-pT3). All tumours were sampled at transurethral resection or cystectomy and had not previously received any other therapy. The presence of a majority of cancer cells in the tumour specimens was confirmed by routine histological assessment. Genomic DNA was extracted from each tumour and paired blood sample using standard methods.
Study
EGAS00001000738
NIH Human Microbiome Project - Core Microbiome Sampling Protocol A (HMP-A)
This first clinical study of the Human Microbiome Project (HMP) addresses whether individuals share a core human microbiome. It involves broad determination of the microbiota found in five anatomical sites: the oral cavity, skin, nasal cavity, gastrointestinal tract and vagina. This study will enroll approximately 300 healthy male and female adults, 18-40 years old, from two geographic regions of the US: Houston, TX and St. Louis, MO. The participation of healthy individuals will create a baseline for discovery of the core microbiota typically found in various areas of the human body. The information from this initial study can then be used to help assess the changes in the complement of microbiota found on or within diseased individuals.
Study
phs000228
ARRA - NHLBI Lung Cohorts Sequencing Project: Genetic modifiers of
The major goal of this project is to apply second generation resequencing technology to identify disease causing variants influencing pediatric and adult lung diseases in a collection of two longitudinal population cohorts of cystic fibrosis patients that have been well characterized for a comprehensive set of clinical traits. In Phase I, exome sequencing was performed on 43 cystic fibrosis patients with early Pa infection and 48 cystic fibrosis patients with late Pa infection to identify variants influencing the time to onset of Pa infection. In Phase II, additional exomes were added to the study, to reach a total of 91 individuals with early Pa infection and 96 with late Pa infection. The majority of the 340 subjects of Phase II do not have a Pa infection phenotype, but instead have a pulmonary function phenotype (121 severe vs. 124 mild impairment) as determined by the survival corrected Kulich FEV percentile of Corey et al. A small minority have intermediate phenotypes and/or show severe decline in lung function during childhood.
Study
phs000254
mapped Bam files from whole transcriptome RNA-seq
In this study we characterized genomic alterations in two to five metachronous bladder tumors from 29 patients initially diagnosed with early stage disease. Fourteen patients (32 tumors) had non progressive disease (NPD) and 15 patients (34 tumors) had progressive disease (PD). Whole exome sequencing (WES, ~50x mean read depth and whole transcriptome RNA-seq was performed (RNA was not advalible for 4 tumors).
Data provided here consist of 71 mapped Bam files form whole transcriptome RNA-seq.
Dataset
EGAD00001002718
unmapped Bam files from whole transcriptome RNA-seq
In this study we characterized genomic alterations in two to five metachronous bladder tumors from 29 patients initially diagnosed with early stage disease. Fourteen patients (32 tumors) had non progressive disease (NPD) and 15 patients (34 tumors) had progressive disease (PD). Whole exome sequencing (WES, ~50x mean read depth and whole transcriptome RNA-seq was performed (RNA was not advalible for 4 tumors).
Data provided here consist of 71 unmapped Bam files form whole transcriptome RNA-seq.
Dataset
EGAD00001002717
503 genotypes from Inner Asia used in 'Close inbreeding and low genetic diversity in Inner Asian human populations despite geographical exogamy' publication
Inner Asia is particularly interesting to understand human history and evolution, as two groups presenting contrasting cultural traits (notably their language, their social organisation and matrimonial system) cohabit. We sampled 503 individuals from these two groups, belonging to 17 populations from 11 distinct ethnic groups (AltaiKizhi, Kazakh, Khakas, Kyrgyz, Mongolian, Shore, Tajik, Telengit, Tubalar, Turkmen and Uzbek). The samples were then genotyped with 5 different DNA-arrays and, after quality-control, the 253,532 autosomal SNPs present in all the arrays were merged together in the present dataset.
Study
EGAS00001002951
WGS on patients with syndromic neurosensory disorder combining deafness and cataract
Disease: Severe congenital deafness, early onset cataracts and various neurological features
Family: 3 affected individuals originated from the same small village (Amarat) in the Kayseri region of Turkey and belonging to the same large extended consanguineous family.
Dataset: 5 BAM files. Whole-genome sequencing (WGS) was applied to the three affected individuals (II.2, II.4 and II.7) and two healthy individuals (II.1 and II.3).
Dataset
EGAD00001005417
Exome Sequencing of a family with thrombocytopenia, red cell macrocytosis, and lymphoblastic leukemia predisposition
A family with a history of bleeding, variable thrombocytopenia, red cell macrocytosis and two cases of pre B-cell acute lymphoblastic leukemia was studied in a single visit. The family was assessed for bleeding history using a bleeding questionnaire. Additionally, complete blood counts were measured and whole blood was collected from five affected individuals and three unaffected individuals for DNA extraction and whole exome sequencing. The goal of this study is to determine the genetic cause of thrombocytopenia, red cell macrocytosis, and predisposition to leukemia in a family. It is hoped that the information obtained from this study will help researchers understand the genetic and molecular basis of platelet and red cell production, as well as leukemia predisposition.
Study
phs000873
Bam files from Whole exome sequencing (WES, ~50x mean read depth) of metachronous bladder tumors
In this study we characterized genomic alterations in two to five metachronous bladder tumors from 29 patients initially diagnosed with early stage disease. Fourteen patients (32 tumors) had non progressive disease (NPD) and 15 patients (34 tumors) had progressive disease (PD). Whole exome sequencing (WES, ~50x mean read depth and whole transcriptome RNA-seq was performed (RNA was not advalible for 4 tumors)
Data provided here consist of 122 Bam files for WES (83 Tumors and 39 blood)
Dataset
EGAD00001002716
A whole-genome sequencing study for evolutionary history of Tibetans and their genetic adaptations to high altitude
Due to a unique adaptation to high altitude, the Tibetan Plateau population has been the subject of much research interest. In this study, we conducted whole genome sequencing of 27 Tibetan individuals. Through our analysis, we inferred a detailed history of demography and revealed the natural selection of Tibetan population. We provided evidence of genetic separation between the two subpopulations of Han and Tibetans as early as 44 to 58 thousand years ago, replicated previously reported high altitude adaptation genes, including EPAS1 and EGLN1, and reported three new candidate genes, including PTGIS, VDR, and KCTD12.
Study
phs001338
Organ_maturation_in_preparation_for_birth__Peds_RFA__to_develop_a_tissue__resource_and_a_single_cell_atlas_of_organ_development_and_maturation_for__dissemination_among_the_scientific_and_clinical_community__RNA
Knowledge about abnormal organ development is important to understand pathology and to develop novel treatment approaches for individuals with congenital and acquired disease. Most of our current understanding is based on examination of tissues from the embryo and early foetus, collected from women undergoing termination of pregnancy in the first trimester (third) of pregnancy. There is very little known about normal and abnormal organ development from a developmental perspective during the crucial last two-thirds of pregnancy when much remodelling of foetal tissues occurs. This study will generate a single-cell atlas of late-foetal lungs, blood, heart, bone and immune organs.
Study
EGAS00001008256
SUM-seq data for Macrophage polarisation to M1 and M2 phenotypes experiment
We stimulated iPSC-derived M0 macrophages with LPS and IFN-γ to induce M1 polarization or IL-4 to induce M2 polarization. To discern early and sustained responses at chromatin accessibility and gene expression levels, we collected samples at five time points along the two polarization trajectories; prior to stimulation (M0) and at 1-hour, 6-hour, 10-hour, and 24-hour intervals, each sampled in duplicates totaling 18 samples, and subjected them to SUM-seq library preparation. Sequenced files for both data modalities are provided as demultiplexed fastq files.
Dataset
EGAD50000001206
Transgenerational Transmission of Post-Zygotic Mutations Suggests Symmetric Contribution of First Two Blastomeres to Human Germline
Skin and blood samples were collected from four phenotypically normal individuals (mother, father, and children) from a randomly selected family. Whole genomes of the samples from these individuals were sequenced to a depth of 30X-200X. Sequencing data for the mother was previously analyzed and is available at NDA collection #2961. Sequenced reads from the children and father were aligned to the human reference genome (hg19) generating BAM files for each individual. We used post-zygotic mutations in the mother to trace cell lineages across human generations. Analysis of this family demonstrated that different cell lineages were transmitted to offspring. Coupled with analysis of publicly available data, this result revealed a fundamental difference between soma and germline in lineage allocation and suggested a 50:50 contribution of the first two blastomeres to the germline.
Study
phs003781
Whole-exome sequencing of matched blood, primary GBM tumours, and patient-derived organoids
Whole-exome sequencing (~250X coverage) of primary GBM tumours and matched patient-derived organoids and normal blood. Samples from two spatially distinct regions of seven tumours from five patients (five primary, two recurrent).
Dataset
EGAD00001007935
Whole exome sequencing data of germline and two independent primary leukemias of five patients
The contribution of genetic predisposing factors to the development of pediatric acute lymphoblastic leukemia (ALL), the most frequently diagnosed cancer in childhood, has not been fully elucidated. Children presenting with multiple de novo leukemias are more likely to suffer from genetic predisposition. Here, we selected five of these patients and analyzed the mutational spectrum of normal and malignant tissues. In two patients, we identified germline mutations in TYK2, a member of the JAK tyrosine kinase family. These mutations were located in two adjacent codons of the pseudokinase domain (p.Pro760Leu and p.Gly761Val). In silico modeling revealed that both mutations affect the conformation of this auto-regulatory domain. Consistent with this notion, both germline mutations promote TYK2 autophosphorylation and activate downstream STAT family members, which could be blocked with the JAK kinase inhibitor I. These data indicate that germline activating TYK2 mutations predispose to the development of ALL.
Study
EGAS00001001889
Exome sequencing reads
Exome sequencing reads of two UFM individuals and their family members (totally 11 individuals) belonging to two different Fragile X families. Alignment files in BAM format are provided.
Dataset
EGAD00001002276
Summary statistics of meta-analysis using two genome-wide association study of inflammatory bowel disease in Koreans.
Inflammatory bowel disease (IBD), a chronic inflammatory disorder of the gastrointestinal tract, is thought to develop due to dysregulated mucosal immune responses to gut flora in genetically susceptible individuals. Crohn’s disease (CD) and ulcerative colitis (UC) are the two major subtypes of IBD. To identify additional susceptibility loci for IBD in Asians, we performed meta-analyses using two genome-wide association studies.
Study
EGAS00001005026
Whole genome sequencing of ASD quartet families
Autism spectrum disorder (ASD) is genetically heterogeneous with >100 susceptibility genes known. We used whole-genome sequencing (WGS) of 85 quartet families (two parents and two ASD-affected siblings) to comprehensively examine mutation characteristics. Our results emphasize using WGS to maximize the detection of all classes of mutations potentially involved in autism, and to enable the interpretation of that data in confirmatory and predictive diagnosis in different individuals in a family.
Study
EGAS00001001023
Genomic profiling of subcutaneous patient derived xenograft models of solid childhood cancer
The pediatric cancer cohort in this study included 70 PDX models from 65 different individuals. This cohort included a total of 16 different pediatric solid tumor subtypes, including fourteen Wilms tumors, thirteen hepatoblastomas, thirteen osteosarcomas, ten germ cell tumors, four neuroblastomas, three clear cell sarcomas, two adrenal cortical carcinomas, two leydig cell tumors, two medulloblastomas, one embryonal rhabdomyosarcoma (ERMS), one Ewing sarcoma, one pleomorphic sarcoma, one adenocarcinoma, one glioblastoma, one mesothelioma and one ovarian tumor. Notably, we have five samples with multiple PDX models from same patient, including two cases with duplicates (564 and 564-Dup, 1796 and 1796-Dup), one case with two different metastasis (560-SM, 560-LM), one case with two blocks from same tumor (1939 and 1939-Dup), and one case with different primary tumor from same patient (2264 and 1932). We have a total of 353 sequencing data, including 82 RNA sequencing data (RNA-seq), 138 whole-exome sequencing (WES) and 135 low-pass whole-genome sequencing (WGS). For RNA-seq data, we have 61 PDXs and 21 PTs; for WES, we have 67 PDXs, 30 PTs and 40 matched normal germlines; for WGS, we have 64 PDXs, 30 PTs and 40 matched normal germlines. Of which, 19 PT-PDX paired RNA-seq, 28 paired PT-PDX paired WES and WGS were included.
Dataset
EGAD00001009863
NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry (VU_AF)
The Vanderbilt Atrial Fibrillation (AF) Registry was founded in 2001. Patients with AF and family members are prospectively enrolled. At enrollment a detailed past medical history is obtained along with an AF symptom severity assessment. Blood samples are obtained for DNA extraction. Patients are followed longitudinally along with serial collection of AF symptom severity assessments.
Study
phs001032
Targeted sequencing of candidate regions on chromosome 22q predisposing to multiple schwannomas
Constitutional LZTR1 or SMARCB1 pathogenic variants have been found in ~86% of familial and ~40% of sporadic schwannomatosis cases. Hence, we performed massively parallel sequencing of the entire LZTR1, SMARCB1 and NF2 genomic loci in 35 individuals with schwannomas negative for constitutional first-hit pathogenic variants in the LZTR1/SMARCB1/NF2 coding sequences, however with 22q deletion and/or a different NF2 pathogenic variant in each tumor, including six cases with only one tumor available. Furthermore, we verified whether any other LZTR1/SMARCB1/NF2 (likely) pathogenic variant(s) could be found in 16 cases carrying a SMARCB1 constitutional variant in the 3’ untranslated region (3’-UTR) c.*17C>T, c.*70C>T or c.*82C>T. As no additional variants were found, functional studies were performed to clarify the effect of these 3’-UTR variants on the transcript. The 3’-UTR variants c.*17C>T and c.*82C>T showed pathogenicity by negatively affecting SMARCB1 transcript level. Two novel deep intronic SMARCB1 variants, c.500+883T>G and c.500+887G>A, resulting in out-of-frame missplicing of intron 4, were identified in two unrelated individuals. Further resequencing of the entire repeat - masked genomics sequences of chromosome 22q in individuals negative for pathogenic variants in the SMARCB1/LZTR1/NF2 coding- and non-coding regions revealed five potential schwannomatosis-predisposing candidate genes, i.e. MYO18B, NEFH, SGSM1, SGSM3 and SBF1, pending further verification.
Study
EGAS00001005680
Genetic Analysis of Limb Malformation Disorders: Miller Syndrome Sequencing Study (LMD-MS)
The overall goal of this project is to investigate the etiology and pathogenesis of malformations (i.e., birth defects) of the limb, concentrating on abnormalities of limb patterning such as limb deficiency/duplications and multiple congenital contractures. The exome sequences of two siblings and two unrelated individuals were obtained by massively parallel DNA sequencing. The four individuals were affected with Miller syndrome (OMIM: 263750). Additionally, the whole-genome sequences of a family of four were obtained with the method of Complete Genomics Incorporated (CGI). The two offspring were both affected with Miller syndrome and is the same sibling pair mentioned previously from whom exome sequences were also obtained.
Study
phs000244
Next-Generation Sequencing of AV Nodal Reentry Tachycardia patients
Atrioventricular nodal reentry tachycardia (AVNRT) is the most common form of regular paroxysmal supraventricular tachycardia. This arrhythmia affects women twice as frequently as men, and is often diagnosed in patients below 40 years of age. Familial clustering, early onset of symptoms, and lack of structural anomaly indicate involvement of genetic factors in AVNRT pathophysiology. We hypothesized that AVNRT patients have a high prevalence of variants in genes that are highly expressed in the atrioventricular conduction axis of the heart and potentially involved in arrhythmic diseases. Next-generation sequencing of 67 genes was applied to the DNA profile of 298 AVNRT patients and 10 AVNRT family members using HaloPlex Target Enrichment System.In total, we identified 229 variants in 60 genes; 215 missenses, four frame shifts, four codon deletions, three missense and splice sites, two stop-gain variants, and one start-lost variant. Sixty-five of these were not present in the Exome Aggregation Consortium (ExAC) database. Furthermore, we report two AVNRT families with co-segregating variants. Seventy-five of 284 AVNRT patients (26.4%) and three family members to different AVNRT probands had one or more variants in genes affecting the sodium handling. Fifty-four out of 284 AVNRT patients (19.0%) had variants in genes affecting the calcium handling of the heart. We furthermore find a large proportion of variants in the HCN1-4 genes. We did not detect a significant enrichment of rare variants in the tested genes.This could be an indication that AVNRT might be an electrical arrhythmic disease with abnormal sodium and calcium handling.
Study
EGAS00001002745
Parallel Detections of Somatic Gene Mutations in Surgically Resected Tumor tissues and Matched Plasma Specimens in Early-Stages of Primary Breast Cancer
We developed two panel successively, contain 68 and 136 genes respectively. Combination with ultrasound or mammography, it could be used for breast cancer early detection and avoided unnecessary surgery or other invasive detection.
Study
EGAS00001003075
Transgenerational transmission of reproductive and metabolic dysfunction in the male progeny of polycystic ovary syndrome
The transgenerational maternal effects of PCOS in female progeny have been revealed. As there are evidence that a male equivalent of PCOS may exist, we asked whether sons born to mother with PCOS (PCOS-sons) transmit reproductive and metabolic phenotypes to their male progeny. Here, in a Swedish nationwide register-based cohort and a clinical case-control study from Chile we found that PCOS-sons are more often obese and dyslipidemic. Their serum miRNAs are found to potentially regulate PCOS-risk genes. Our prenatal androgenized PCOS-like mouse model with or without diet-induced obesity confirmed that reproductive and metabolic dysfunctions in F1 male offspring are passed down to F3. Small non-coding RNAs (sncRNAs) sequencing of F1-F3 sperm revealed distinct differentially expressed (DE) sncRNAs across generations in the androgenized, obese, and obese and androgenized lineages, respectively. Notably, common targets between transgenerational DEsncRNAs in mouse sperm and in PCOS-sons serum indicate similar effects of maternal hyperandrogenism. These findings strengthen the translational relevance highlighting a previously underappreciated risk of reproductive and metabolic dysfunction via the male germline transmission and potential molecular markers to study in future generations.
Study
EGAS00001007079
Analyzing Somatic Mutagenesis in Systemic Sclerosis
The study was undertaken with the aim of estimating differences in the mutational burden in lung tissues from healthy vs systemic sclerosis (SSc) patients. SSc is an auto-immune disease, with patients often developing cancer. However, the etiology of this phenomenon is poorly understood. As such we sought to uncover potential molecular mechanisms prevalent within SSc tissues that might be potentiating cancer. Samples were randomly collected from individuals from two broad ethnic groups, European (EA) and African American (AA). In total, samples were obtained from 7 individuals belonging to the EA group and 4 individuals from the AA group. For this study, ethnicity was not included as a confounding variable, given that the data set has roughly comparable number of individuals from both ethnicities for the healthy and SSc cohorts. Study participants ranged in age from 37-63 years old. Median age (52 years old) of healthy individuals was comparable to the SSc cohort (~48 years old). Participants were self-described as current/former smokers or non-smokers. Lung tissues were obtained under a protocol approved by the University of Pittsburgh Institutional Review Board. Written informed consent statements were provided to the University of Pittsburgh IRB by the study participants/patients. Sample sizes were determined by the ability of bulk lung fibroblast tissues to stably propagate in culture until single cell clones could be obtained. Samples for which sufficient cell density could not be achieved for downstream high-depth sequencing analysis were excluded from the study and are not reported. Five or greater than 5 samples are collected from each cohort including healthy and SSc patients. Subjects were allocated to two basic cohorts based on diagnosis at the time of explant. Participants with no reported lung anomalies were allocated to the "healthy" cohort, whereas individuals with reported lung abnormalities, primarily SSc with pulmonary fibrosis and/or hypertension were allocated to the "SSc" cohort. All tissue samples were cultured, propagated and processed in identical fashion without requiring individual-specific protocols. Additionally, equivalent amounts of DNA were used for preparing sequencing libraries from both cohorts, and all samples were sequenced using Illumina whole genome sequencing (WGS) and analyzed using identical computational pipelines. Our study primarily discovered that there is a significantly higher burden of somatic mutations in SSc samples, compared to healthy samples. We observed an overall increase in a range of mutations in genomes of SSc patients, including single base substitutions, indels, structural variants and copy number alterations. Additionally, we uncovered a novel mutation signature associated with SSc samples, hinting at the activity of POLH in these samples, and highlighting the potential nexus between inflammation, DNA damage, and cancer.
Study
phs003700
Long-read whole-genome sequencing-based concurrent haplotype phasing and aneuploidy profiling of single cells
This dataset contains long-read whole-genome sequencing (lrWGS) data from 14 samples. Five lrWGS data are from single-cell (sc, sc_2, sc_3), multi-cell (mc, 10 cells), and bulk samples of HG002. The remaining nine lrWGS data are from two preimplantation genetic testing (PGT) families, including four from blood bulk DNA of the parental pairs and five from trophectoderm biopsies of two embryos from one family and three embryos from another family. The data are provided in raw FASTQ format and were generated using the PromethION device from Oxford Nanopore Technologies.
Dataset
EGAD50000000787
Samples obtained within X-pand project
This dataset contains five bone marrow samples and three mobilised peripheral blood ones from different healthy individuals.
The three bone marrow samples have been sequence with the 10x multiome kit using Illumina sequencers.
The three mobilised peripheral blood ones have been split in two aliquots, one for 10x multiome and the other for 10x CITE-seq. Both libraries types have been sequenced with an Illumina sequencer.
Raw data has been processed with cell-ranger and map to the human genome (GRCh38) to obtain the bam files.
Dataset
EGAD50000001108
Lung cancer Early Molecular Assessment
This study aimed to validate the potential utility of a clinically accessible, highly sensitive tumor-informed circulating tumor DNA (ctDNA) assay. The study considers a ‘landmark’ time period (between two weeks and four months after definitive treatment), as well as longitudinal sampling (>2 weeks) and compares two independent, retrospectively collected real-world cohorts.
Study
EGAS50000000896
GWAS data (Illumina 2.5 M SNPs) in Cuban cohorts of dengue disease
We will have 274 individuals typed for the Illumina Human Omni 2.5 chip. The individuals are from two locations in Cuba (Havana and Guantanamo) and from four phenotype classes (asymptomatic, control dengue fever and dengue hemorrhagic fever).
Study
EGAS00001002276
Identification of potential blood biomarkers for early diagnosis of Alzheimer���s disease through immune landscape analysis
Mild cognitive impairment (MCI) is a clinical precursor of Alzheimer���s disease (AD). Recent genetic studies have reported on associations between AD risk genes and immunity. Here, we obtained samples and data from 317 AD, 432 MCI, and 107 cognitively normal (CN) subjects and investigated immune-cell type composition and immune clonal diversity of T-cell receptor (TRA, TRB, TRG, and TRD) and B-cell receptor (IGH, IGK, and IGL) repertoires through bulk RNA sequencing. Our prognosis prediction model using the potential blood-based biomarkers for early AD diagnosis, which combined two immune repertoires (IGK and TRA), WDR37, and clinical information, successfully classified MCI patients into two groups, low and high, in terms of risk of MCI-to-AD conversion.
Study
JGAS000532
Proteogenomics reveals two distinct biological pilocytic astrocytoma subgroups
Pilocytic astrocytoma (PA) is the most common pediatric brain tumor and driven by aberrant MAPK signaling, typically mediated by BRAF alterations. While five-year overall survival rates exceed 95%, tumor recurrence constitutes a major clinical challenge in incompletely resected tumors despite chemotherapeutic or radiation based therapies. Therefore, we used proteogenomics to discern the biological heterogeneity of PA to improve classification of this tumor entity and identify novel therapeutic targets.
Our proteogenomics approach integrates RNA sequencing and LC/MS-based proteomic profiling data from a cohort of 58 confirmed, primary PA samples. An integrative genomics approach was conducted to discern the biological heterogeneity of PA and to identify aberrant pathway activation in these biological subgroups.
In summary, Pilocytic astrocytomas segregate into two groups where younger patients are significantly associated with Group 1. Importantly, we validate the two distinct biological subgroups in two non-overlapping cohorts. The biological heterogeneity seen here may improve biological classification and reveal novel therapeutic targets specifically useful for non-resectable tumors with high risk of recurrent or progressive disease.
Study
EGAS00001006402
"BaTwa" populations from Zambia retain ancestry of past hunter-gatherer groups
We genotyped and analyzed over two million genome-wide SNPs from two BaTwa populations (40 individuals by population), and from three comparative farming populations (10 to 21 individuals by population) in order to: (i) determine if the Zambian BaTwa carry any genetic material linking them to past hunter-gatherer-groups, and (ii) characterize the genetic affinities of past Zambian hunter-gatherer-groups. We found that both BaTwa populations harbor a hunter-gatherer- like genetic component, representing ∼19% (Bangweulu) and ∼31% (Kafue) of their genetic ancestry, while the rest of their ancestry is similar to western Africans.
Study
EGAS50000000378
Unraveling mutagenic processes influencing the tumor mutational patterns of individuals with Constitutional Mismatch Repair Deficiency
Constitutional mismatch repair deficiency (CMMRD), caused by bi-allelic germline variants in one of the mismatch repair (MMR) genes, is a childhood cancer predisposition syndrome that often results in the development of multiple tumors early in life. A better understanding of mutational processes driving subsequent CMMRD tumors could advance optimal treatment. Therefore, we performed a genomic characterization of 41 tumors from 17 individuals with CMMRD. Mutational patterns in these tumors were found to be influenced by multiple factors, including the affected MMR genes, tumor types and somatic polymerase proofreading mutations. Temozolomide treatment left prominent mutational signatures in two second primary hematologic malignancies. Furthermore, a novel indel signature was found in 54% of the tumors, characterized by one base pair cytosine insertions in cytosine homopolymers. In conclusion, the analysis of sequential tumors in individuals with CMMRD reveals diverse mutational patterns influenced by the underlying mutated MMR gene, tumor type and treatment history.
Study
EGAS00001007660
Singapore Adult Metabolism Study - Phase 2 (SAMS2)
Singapore Adult Metabolism Study - Phase 2 (SAMS2) was an interventional study where hundreds of donors aged 21-45 were recruited to participate in a 16-week weight loss program. Study individuals selected (see below for more detailed selection standards) were sedentary (exercise 1 or fewer times a week), obese or overweight with a body fat mass greater than 24% and a BMI between 23-35 kg/m2. For this study, we adjusted BMI definition for Asian population, based on the WHO Consultation 2002, and the BMI cut-off is 23 kg/m2 for overweight and 27.5 kg/m2 for obese. The weight loss program included a combination of dietary interventions, structured exercise sessions, and additional physical activity performed in participants' own time. Energy and protein requirements were calculated based on each participant's weight, height, and physical activity level, with the goal of achieving a 40% calorie deficit. Participants' calorie intake was tracked using food recalls and questionnaires. Additionally, subjects attended structured exercise sessions at least three times per week, supervised by a coach. Each session consisted of 90 minutes of aerobic and strength training exercises, designed to burn approximately 500 kcal per session. To monitor daily physical activity, participants wore pedometers throughout the study. In total, the exercise sessions (500 kcal per session) and daily physical activity (targeting an additional 500 kcal) were aimed at achieving a total caloric expenditure of 2000 kcal per week. We collected clinical data and skeletal muscle biopsies from 54 overweight/obese Asian individuals before and after a 16-week lifestyle intervention, which resulted in an average ~10% weight loss, accompanied by a ~30% increase in insulin-stimulated glucose uptake. Improvements were observed in 118 of 252 clinical traits and six blood lipids. Transcriptomic analysis of paired skeletal muscle biopsies identified 505 differentially expressed genes enriched in mitochondrial function and insulin sensitivity. Thousands of muscle-specific e/sQTLs were detected pre- and post- intervention, including hundreds of lifestyle-responsive e/sQTLs. Notably, approximately 4.2% of eQTLs and 7.3% of sQTLs showed Asian specificity. Joint analysis with GWAS identified 16 putative metabolic risk genes. Our study reveals gene-by-lifestyle interactions and how lifestyle modulates gene regulation in skeletal muscle.
Study
phs004078
Natural history of clonal haematopoiesis (2017-09-04)
The incidence of acute myeloid leukemia (AML) increases with age and mortality exceeds 90% when diagnosed after age 60. Only 10-15% of cases evolve from a pre-existing myeloproliferative or myelodysplastic disorder; the remaining cases arise de novo without a detectable prodrome and are diagnosed upon development of bone marrow failure. Analysis of diagnostic blood samples has demonstrated that de novo AML is preceded by the accumulation of somatic mutations in pre-leukemic hematopoietic stem and progenitor cells (preL-HSPCs) that subsequently undergo clonal expansion. If individuals in this pre-leukemic phase could be identified, methods for determination of risk and monitoring for progression to overt AML could be developed. However recurrent AML mutations also accumulate during aging in healthy individuals who never develop AML, referred to as age related clonal hematopoiesis (ARCH). To distinguish individuals with preL-HSPCs at high risk of developing AML from those with ARCH, we undertook deep targeted sequencing of genes recurrently mutated in AML in blood samples from 133 individuals in the European Prospective Investigation into Cancer and Nutrition (EPIC) study taken on average 6 years before they developed AML (pre-AML group), together with 683 matched healthy individuals (Control group). Pre-AML cases displayed accelerated age-correlated accumulation of somatic mutations.The identity, number and variant allele frequency (VAF) of mutations differed between the two groups, and were incorporated into a computational model of AML risk prediction that accurately distinguished pre-AML cases from controls on average 7 years prior to AML development. Our findings provide proof of concept that early prediction of AML development is feasible in high-risk populations, paving the way for early disease detection, monitoring, and potentially prevention.
Dataset
EGAD00001003703
A CCG expansion in ABCD3 causes oculopharyngodistal myopathy in individuals of European ancestry
In this study we describe the identification of CCG expansions in ABCD3 in affected individuals across eight unrelated OPDM families of European ancestry. In two large Australian OPDM families, using a combination of linkage studies, short-read WGS and targeted ONT sequencing, we identified CCG expansions in the 5’UTR of ABCD3. Independently, the ABCD3 CCG expansion was identified through the 100,000 Genomics England Genome Project in three individuals from two unrelated UK families diagnosed with OPDM.
Study
EGAS50000000298
The National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) Consortium for the Longitudinal Evaluation of African-Americans with Early Rheumatoid Arthritis (CLEAR)
The Consortium for the Longitudinal Evaluation of African-Americans with Early Rheumatoid Arthritis (CLEAR) Registry and Repository, supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS), was established in 2000 to provide a resource for the scientific community to explore genetic and non-genetic factors affecting rheumatoid arthritis (RA) occurrence and outcomes in African Americans. The long term objective is a database and a repository of 1,100 RA and 550 matched healthy African-American subjects. This CLEAR Registry and Repository has two arms: a longitudinal arm for subjects with early RA (enrollment from 2000 to 2005) and a cross-sectional arm for subjects with any disease duration (enrollment from 2006 to 2012). CLEAR has two components: a database and a repository. The database contains extensive demographic, socioeconomic, clinical and radiographic (radiographs of hands and feet) information and bone mineral density data from DEXA scans. The repository contains genomic DNA, plasma and serum on most of the participants. Participants in CLEAR II had RNA isolated from peripheral blood cells.
Study
phs001360
Transcriptomic profiling of granulosa cells from IVF patients at different ages
Two major factors contributing to reduced fertility is use of exogenous hormones and old age. Here, we sequenced RNA from granulosa cell samples from human patients undergoing in vitro fertilization to test how hormonally stimulated cells' transcriptional profiles change with age. All patients were normal responders referred for male infertility and were grouped into two age groups. The gene expression differences in aging were compared to mouse data (previous experiments, see ArrayExpress E-MTAB-13479) and the results are published in the article "Granulosa cell transcription is similarly impacted by superovulation and aging and predicts early embryonic trajectories". Corresponding count tables are available in ArrayExpress (E-MTAB-13496).
Study
EGAS50000000824
The multifaceted genomic history of Ashaninka from Amazonian Peru
Genome-wide profiles of Ashaninkas from Peruvian Amazonia reveal two genetic subgroups.
Ashaninka ancestors derived from a south-north migration from southeastern South America.
The Ashaninka1 subgroup testifies to an Arawakan-Caribbean genetic link in the Early Ceramic Age.
The Ashaninka2 subgroup had recent interactions with Andean and Pacific Indigenous populations.
Study
EGAS00001006958
WTCCC2 Visceral Leishmaniasis (VL) samples
A WTCCC2 project genome-wide association study for visceral leishmaniasis (VL) in individuals from India, Brazil and Sudan, genotyped on the custom Illumina 670k array. The WTCCC2 analysis of the Brazilian and Indian samples is described in Fakiola et al. [Nat Genet. 2013 Feb;45(2):208-13].It should be noted that due to expected family structure in the data, normal analyses of these data should include an estimation of the relatedness between the samples. For more details about sample collection for the project please refer to the Methods section of the paper above. The samples from India were all collected from Bihar state in northeastern India. The Brazilian samples were collected from a wide area of northern Brazil, and managed at two laboratories in Natal and Belem. For more details on the geographic distribution of the Brazil samples please see Jamieson et al. [Genes Immun. 2007 Jan;8(1):84-90]. We have not generated qc_passed or info files for the Sudanese data, a preliminary scan of the data indicated that there may be DNA quality issues with these samples.
Study
EGAS00001000773
Genome-wide chromatin accessibility profiling of primary human glomerular and kidney cortex tubular outgrowth cultures
We generated primary cultures from mechanically isolated kidney glomeruli (filtration unit of the nephron) which are composed of podocytes and mesangial cells. In parallel, we generated primary kidney cortex tubule cell cultures, which are composed primarily of proximal tubule cells. Early passage cultures of these two cell types were subjected to chromatin accessibility profiling (DNase-Seq) and gene expression profiling (RNA-Seq). We found thousands of dynamically regulated enhancers in both cell types that potentially regulate nearby and distal target genes that are differentially expressed. These data will be useful for understanding the epigenomic regulation of gene transcription in key kidney cell types.
Study
phs001720
NIDDK IBD Genetics Consortium Crohn's Disease Genome-Wide Association Study
This dataset contains data from a genome-wide association study performed with 968 Inflammatory Bowel Disease (IBD) affected cases and 995 unrelated controls using the Illumina HumanHap300 Genotyping BeadChip. Cases were selected to have Crohn's disease with ileal involvement, and controls were matched to cases based on sex and year of birth. Subjects were drawn from two cohorts: (1) persons with non-Jewish, European ancestry (561 cases and 563 controls), and (2) persons with Jewish ancestry (407 cases and 432 controls). Genotyping was performed at the Feinstein Institute for Medical Research. Seven-hundred fifty-four of the samples (468 cases and 286 controls) were taken from the NIDDK IBD Genetics Consortium cell line repository. These samples are identified in the IBD_Sample file. The subject IDs for these individuals may be used to request corresponding samples for follow-up research through the repository. In addition, complete phenotype data for these individuals are available, together with the Consortium's phenotyping manual and the forms used to collect the data. The remaining 1,209 samples were obtained from pre-existing collections ascertained through Cedars-Sinai Medical Center, Johns Hopkins University, University of Chicago, University of Montreal, University of Pittsburgh, University of Toronto, and the New York Health project (controls only). For these samples, only sex, cohort (Jewish vs. non-Jewish), and age at diagnosis (cases only) are available. Two-hundred three individuals from among the pre-existing samples did not provide consent to release their genotype data (designated as consent group 2 in the file IBD_Subject). Thus, individual genotype data are only provided for 1,760 samples. To compensate for this, we have provided summary results for each SNP. These are based on a stratified analysis testing case/control association. Fifty-one samples had a call rate less than 93% and were therefore excluded from this analysis, leaving an overall sample size of 1,963 - 51 = 1,912. X Chromosome Heterozygosity Nine samples have X chromosome heterozygosity that is neither consistent nor inconsistent with their phenotypic sex. One of these samples was found to have Turner Syndrome. The remaining 8 samples have heterozygosity ranging from 35-76%.
Study
phs000130
Whole exome sequencing data of germline and two independent primary leukemias of five patients
The contribution of genetic predisposing factors to the development of pediatric acute lymphoblastic leukemia (ALL), the most frequently diagnosed cancer in childhood, has not been fully elucidated. Children presenting with multiple de novo leukemias are more likely to suffer from genetic predisposition. Here, we selected five of these patients and analyzed the mutational spectrum of normal and malignant tissues.
Dataset
EGAD00001002266
Deciphering RBP - alternative splicing networks in ALS iPSC-MN: TDP-43, NOVA1, NOVA2 and RBFOX2 eCLIP-seq
We produced enhanced cross-linking immunoprecipitation (eCLIP) sequencing data of the RNA binding proteins (RBP) TDP-43, NOVA1, NOVA2 and RBFOX2 in iPSC motor neurons of two healthy control individuals, respectively.
Study
EGAS00001005880
Pakistan_Damgaard2018
Genotype data for 140 present-day individuals from five populations in Pakistan in The first horse herders and the impact of early Bronze Age steppe expansions into Asia DOI: 10.1126/science.aar7711. Sampling details are presented in supplementary section S2.1 Data generation
Dataset
EGAD00010001678
SureTypeSC - accurate genotyping of single-cell SNP array data
We used a collection of single cells from two Coriell cell lines to train and validate a machine learning method for quality assessment of the single cell genotypes
Study
EGAS00001004621
Genomic profiling of fragile X syndrome unmethylated full mutation carriers
Whole genome sequencing of two unrelated FMR1 unmethylated full mutation carriers. The existence of rare, unmethylated full mutation carriers who carry the mutation that causes fragile X syndrome but do not gain methylation at the locus offers a rare opportunity to study the mechanisms underlying fragile X syndrome. This study aims to investigate the genomic variants in these individuals alongside transcriptomic data to elucidate underlying genomic variants behind transcriptomic differences observed in UFM individuals.
Study
EGAS50000000648
Origin of second malignancies in children
Pediatric cancers are rare diseases, and children without known germline predisposing conditions who develop a second malignancy during developmental ages are extremely rare. We present four such clinical cases and through whole-genome and error-correcting ultra-deep duplex sequencing of tumor and normal samples, we explored the origin of the second malignancy in four children, uncovering different routes of development. The exposure to cytotoxic therapies was linked to the emergence of a secondary AML. A common somatic mutation acquired early during embryonic development was the driver of two solid malignancies in another child. In two cases, the two tumors developed from completely independent clones diverging during embryogenesis. Importantly, we demonstrate that platinum-based therapies contributed at least in one order of magnitude more mutations per day of exposure than aging to normal tissues in these children.
Study
EGAS50000000167
Circulating cell-free and extracellular vesicles-derived microRNA as prognostic biomarkers in patients with early-stage NSCLC: results from RESTING study
Lung cancer, especially non-small cell lung cancer (NSCLC), has high mortality rates. Early-stage NSCLC shows a 50% relapse rate post-surgery, with most recurrences occurring within two years and a 5-year survival of only 30%. This study focuses on the prognostic value of circulating miRNAs in surgically treated ES-NSCLC patients, aiming to assess their predictive accuracy and compare them to standard prognostic factors.
Study
EGAS50000001032
End motifs analysis of circulating DNA from the plasma of patients with stage II-III breast cancer (n=50), stage I-III non-small cell lung cancer (n=56), metastatic colorectal cancer (mCRC) (n=15) and healthy individuals (n=37)..
We have developed an algorithm designed to discriminate cancer patients and healthy individuals based on cirDNA fragment end motif analysis assisted by machine learning, using data obtained from shallow whole genome sequencing (a method we call EMA). We applied EMA to cirDNA from the plasma of patients with stage II-III breast cancer, stage I-III non-small cell lung cancer, and metastatic colorectal cancer (mCRC). CirDNA from 158 individuals was prepared following the conventional double-stranded DNA library preparation (DSP). We also performed a single-stranded DNA library preparation (SSP) using mCRC patients and healthy control cirDNA, which allowed us to make the first ever end motif analysis in the literature which compares the use of DSP and SSP.
Study
EGAS50000001319
miRNA expression data from primary tumors, metastasis and matched normals.
MicroRNAs (miRs) have been recognized as promising biomarkers. It is unknown to what extent tumor-derived miRs are differentially expressed between primary colorectal cancers (pCRCs) and metastatic lesions, and to what extent the expression profiles of tumor tissue differ from the surrounding normal tissue. Next-generation sequencing (NGS) of 220 fresh-frozen samples, including paired primary and metastatic tumor tissue and non-tumorous tissue from 38 patients, revealed expression of 2245 known unique mature miRs and 515 novel candidate miRs. Unsupervised clustering of miR expression profiles of pCRC tissue with paired metastases did not separate the two entities, whereas unsupervised clustering of miR expression profiles of pCRC with normal colorectal mucosa demonstrated complete separation of the tumor samples from their paired normal mucosa. Two hundred and twenty-two miRs differentiated both pCRC and metastases from normal tissue samples (false discovery rate (FDR) <0.05). The highest expressed tumor-specific miRs were miR-21 and miR-92a, both previously described to be involved in CRC with potential as circulating biomarker for early detection. Only eight miRs, 0.5% of the analysed miR transcriptome, were differentially expressed between pCRC and the corresponding metastases (FDR <0.1), consisting of five known miRs (miR-320b, miR-320d, miR-3117, miR-1246 and miR-663b) and three novel candidate miRs (chr 1-2552-5p, chr 8-20656-5p and chr 10-25333-3p). These results indicate that previously unrecognized candidate miRs expressed in advanced CRC were identified using NGS. In addition, miR expression profiles of pCRC and metastatic lesions are highly comparable and may be of similar predictive value for prognosis or response to treatment in patients with advanced CRC.
Dataset
EGAD00001001644
Biallelic variants in the non-protein coding minor spliceosome components RNU6ATAC and RNU6ATAC cause syndromic monogenic autoimmune diabetes
Non-protein coding genes are emerging as critical contributors to the aetiology of rare diseases, providing key insights into human biology and uncovering novel disease mechanisms. We identified 7 individuals from 4 families with early-onset diabetes (diagnosed <5 years) and immune dysregulatory features caused by biallelic variants in RNU6ATAC. RNU6ATAC encodes a small nuclear RNA (snRNA) acting as a catalytic component of the minor spliceosome, a protein-RNA complex mediating splicing of ~700 genes containing U12/minor-type introns. Variant screening of the other 64 minor spliceosome genes in 276 infants with diabetes identified 12 unrelated individuals with biallelic disease-causing variants in RNU4ATAC. Biallelic pathogenic RNU4ATAC variants are known to cause a variable spectrum of clinical features, which until now did not include diabetes. Clinically, 12/19 RNU6ATAC/RNU4ATAC patients had additional immune dysregulatory features, and 50% of patients tested were islet-autoantibody positive, strongly supporting an autoimmune aetiology for their diabetes. RNA-seq in 3 individuals with biallelic RNU6ATAC variants showed a pattern of intron retention in U12 intron-containing genes similar to that seen in RNU4ATAC-individuals (n=3), supporting a shared disease mechanism. Analysis of patients’ transcriptomic, methylation and immune data revealed impaired B cell development and maturation. We conclude that biallelic RNU6ATAC variants cause a novel syndrome of early-onset autoimmune diabetes and immune dysregulation. We further show that infancy-onset diabetes is a previously unrecognised feature of RNU4ATACopathy. Our work highlights the important role of two snRNAs critical to minor spliceosome function in immune system regulation, providing novel insights into the pathogenesis of autoimmune diabetes.
Study
EGAS50000001565
The Bangladesh Environmental Enteric Dysfunction (BEED) Study
Environmental enteric dysfunction (EED), a sub-acute inflammatory condition of the small intestinal mucosa of unclear etiology, has been associated with a variety of environmental exposures and host factors and is implicated in growth faltering. Linear growth faltering usually occurs within the first two years of life and in most of the cases is irreversible demanding early diagnosis for treatment to be successful. The objectives of 'The Bangladesh Environmental Enteric Dysfunction Study (BEED)' are to investigate role of EED in malnutrition, examine the biology of EED to identify common biological pathways for potential interventions, to validate a system for histological scoring for EED and test the effectiveness of nutritional interventions in improving the growth parameters in children with stunting and or EED. In Bangladesh, participants are recruited from two age groups; a child cohort and a malnourished an adult cohort. In addition, two control groups are recruited for comparison consisting of a) children that are undergoing endoscopy as a part of their clinical care at the University of Virginia Health system (UVAHS) and b) an adult well-nourished control group recruited from the Gastroenterology Outpatient Department of Dhaka Medical College and Hospital in Bangladesh. The description of the study design and procedures of the study can be obtained from Mahfuz M, Das S, Mazumder RN, et al., 2017, PMID: 28801442.
Study
phs001891
RNA sequencing data from visceral and abdominal subcutaneous adipose tissue from morbidly obese women with normal glucose tolerance or type 2 diabetes
The study group consisted of 17 obese women with normal glucose tolerance and 15 obese women with T2DM. Adipose tissue specimens were taken from the epigastric region of the abdominal wall (SAT) and from the major omentum (VAT). RNA was isolated and RNA sequencing was used to analyse the transcriptome. Dharuri H et al, Diabetologia. 2014;57(11):2384-92.
Study
EGAS00001001872
Centers for Common Disease Genomics (CCDG) - Whole Genome Sequencing in Type 1 Diabetes (T1DGC)
The Type 1 Diabetes Genetics Consortium (T1DGC) was established to collect resources (biological samples and data) and conduct research to better understand the genetic basis of type 1 diabetes (T1D). Collection was initiated by ascertaining affected sib-pair families (both parents, two affected siblings and, when available, an unaffected sibling), collected from five geographic regions through four recruitment networks (Asia-Pacific, Europe, North America, United Kingdom). In addition, the T1DGC collected trio families (both parents and affected child) and cases and controls from low-prevalence populations (African-American, with four grandparents self-reporting as African ancestry; Mexican-American, with four grandparents self-reporting as ancestry from Mexico). The T1DGC also served as a repository for contributed collections from other studies, all meeting the broad data-sharing policy of the T1DGC, for inclusion in the genetic studies. These collections include T1D case samples ascertained from the UK Genetic Resource Investigating Diabetes (UK GRID) cohort, SEARCH for Diabetes in Youth (SEARCH), The Genetics of Kidneys in Diabetes (GoKinD), and control samples obtained from the British 1958 Birth Cohort, the UK National Blood Services collection, CLEAR (Consortium for the Longitudinal Evaluation of African-Americans with Early Rheumatoid Arthritis), the New York Cancer Project (NYCP), and other cohorts. For the NHGRI-funded Centers for Common Disease Genomics (CCDG) project, participants with T1D and ancestry-matched controls were identified through the T1DGC, either through direct ascertainment or by contribution from other sources to the T1DGC. As the CCDG has focused initially on non-Caucasian populations for whole genome sequencing, T1DGC participants of African, Mexican and Asian ancestry (targeting ~1200 cases and ~1200 controls in each ancestral group) and a small group of participants of Northern European ancestry (~100 cases, ~100 controls) were to be contributed to the study. Whole genome sequencing of T1DGC samples would be conducted at Washington University McDonnell Genome Institute and based upon matching case-control status within an ancestry group and prioritization by the CCDG.
Study
phs001222
Whole Exome Sequencing of One Nuclear Family with Non-syndromic Sensorineural Hearing Loss
Our goal is to find genes responsible for non-syndromic sensorineural hearing loss. Blood samples were collected from the JS6 family affected with hearing loss. The family is of Caribbean Hispanic ethnicity. Family JS6 consisted of two deaf siblings, JS6.001 (Male) and JS6.002 (Female) and healthy parents, JS6.100 (mother) and JS6.200 (father). The siblings had no other medical findings. Audiometry tests and Rinne and Weber tuning fork tests identified sensorineural hearing loss in the two siblings. We performed whole exome sequencing of the four individuals and identified a recessive mutation, p.(Arg186Trp), in the CIB2 gene in the two affected siblings. Both parents were unaffected carriers.
Study
phs000969
De Novo Characterization of Cell-Free DNA Fragmentation Hotspots in Plasma Whole-Genome Sequencing
The raw sequencing data deposited here are low-coverage (~1X) whole-genome sequencing (WGS) of plasma cell-free DNA (cfDNA). The purpose is to validate the performance of early-stage cancer diagnosis in the Zhou et al. 2020 BioRxiv paper, mainly for two types of cancer: early-stage liver cancer and breast cancer. The cases and controls are matched with age, gender, smoking history, and alcohol usage. We developed a computational method to identify the cfDNA fragmentation hotspots from pooled low-coverage cfDNA WGS data. We found the cfDNA fragmentation hotspots are highly enriched in regulatory regions, such as open chromatin regions. The signals from these regions can help the diagnosis of early-stage cancers and their tissues-of-origin.
Study
phs003062
Single-cell whole-genome sequencing of matched primary GBM tumours and patient-derived organoids
Single-cell whole-genome sequencing of primary GBM tumours and matched patient-derived organoids. Obtained using the 10X Genomics single-cell CNV solution. Samples from two spatially distinct regions of five tumours from three patients (three primary, two recurrent).
Dataset
EGAD00001007937
DNA methylation repeatability in the Lothian Birth Cohorts of 1921 and 1936.
The repeatability of longitudinal measures of whole blood DNA methylation (obtained using the Illumina 450k chip) was assessed in two cohorts of ageing. The Lothian Birth Cohort of 1921 and the Lothian Birth Cohort of 1936. Data were collected at ages 70, 73, and 76 (LBC1936) and 79, 87, 90 (LBC1921) with 478 participants having two or more measures of methylation.
Study
EGAS00001000910
Smart-seq2 single-cell RNA-seq of human liver non-parenchymal cells
Smart-seq2 single-cell RNA-seq of human liver non-parenchymal cells from lean and obese individuals
Dataset
EGAD00001010301
WGS of off-target analysis of prime editing in organoids
Five edited and two unedited organoid clones with one clone prior to editing were paired-end whole genome sequenced using Illumina Novaseq 6000 system. The reads were mapped to hg38 genome assembly and data is provided as BAM files.
Dataset
EGAD00001007744
Molecular profiling of MBD4-deficient acute myeloid leukaemia
We describe two cases of MBD4-deficient acute myeloid leukaemia (AML). Clinical samples were used for molecular analysis, including genomic profiling (genome and exomes), methylation analysis (RRBS) and transcriptional profiling (RNA-sequencing). An analysis and methodological description has been published: “MBD4 guards against methylation damage and germline deficiency predisposes to clonal hematopoiesis and early-onset AML” [PMID: 30049810]
Study
EGAS00001002581
Genetic measurement of memory B-cell recall using antibody repertoire sequencing
We developed an improved high throughput sequencing approach to measure the quantities and sequences of the repertoire of antibody heavy chain RNA in a blood sample. Using this approach we analyzed the antibody repertoire in response to yearly vaccinations with influenza vaccines TIV and LAIV in healthy adults in two subsequent years. We determined vaccine response patterns specific to LAIV and TIV and found antibody sequences that were shared between two samples of the same individuals following influenza vaccination in subsequent years, thereby providing a genetic measurement of B-cell memory recall.
Study
phs000656
STAMPEED: Northern Finland Birth Cohort 1966 (NFBC1966)
The Northern Finland Birth Cohorts program (NFBC) was initiated in the 1960s in the two northernmost provinces of Finland to study risk factors involved in pre-term birth and intrauterine growth retardation, and the consequences of these early adverse events on subsequent morbidity and mortality. The uniqueness of NBFCs is that the data of the cohorts were obtained from early fetal life (including maternal health during pregnancy) to adulthood. The NFBC1966 includes 12,058 live births to mothers in the two northern-most provinces of Finland. Two decades later, a second cohort of 9432 births was obtained (NFBC1986). In NFBC1966 pregnancies were followed prospectively from the first antenatal contact (10-16th week). After birth, the offspring were examined and then again underwent clinical evaluation at ages 1y, 7y, 14-16y and 31y. At each visit, a wide range of phenotypic, lifestyle and demographic data were gathered by questionnaires and clinical examinations. For the most part, NFBC1986 has undergone similar evaluations to NFBC1966. Linkage to national registries includes hospitalization, deaths, education, medication, pensions, and provides up-to-date demographic and clinical information for members of both cohorts. DNA samples were obtained from 5,923 subjects from NFBC1966 and 6688 subjects from NFBC1986. Data coverage, 96% of all births in 1966 and 99% in 1986, is highly representative for the whole population. The NFBC program comprises more than 20 different projects coordinated by the Center of Lifecourse Disease studies in Northern Finland (COLD) at Oulu University. The prospective data collected from the NFBCs form a unique resource, allowing the study of disease emergence, and of the importance of genetic, biological, social and behavioral risk factors. The genome-wide association (GWA) study sponsored through the STAMPEED program of NHLBI employed genomic DNA samples previously collected by the NFBC1966 study and stored in the DNA repository of the National Institute for Health and Welfare, Finland. This NHLBI sponsored RO1 project aimed to identify genetic variants contributing to metabolic and cardiovascular diseases (CVD). In addition to de-identified genome wide genotypic data, a selected list of phenotypic data related to CVD including weight, height, BMI, HDL, LDL, total cholesterol, triglyceride, glucose, insulin and fasting status, are also available in dbGaP. A summary of the GWAS for the NFBC1966 cardiovascular risk traits can be found in Sabatti et al., Nature Genetics 41: 35-46, 2009, PMID: 19060910. The version 2 release of this study contains sequence data from seventeen loci associated with levels of triglyceride, HDL-C, LDL-C, total cholesterol, fasting plasma glucose, and fasting plasma insulin (Kathiresan et al. 2008, Willer et al. 2008, Sabatti et al. 2009, Dupuis et al. 2010, Teslovich et al. 2010). At each locus, protein-coding regions and 5' and 3' untranslated regions of genes nearest to single nucleotide polymorphisms showing genome-wide significant association with metabolic syndrome-related traits, were sequenced. Targeted Illumina sequencing of 78 genes (~270kb) using 150bp probes was performed on 4943 subjects of the Northern Finland Birth Cohort 1966 (NFBC1966). Whole exome sequencing on the Illumina platform was carried out on 586 of those participants. The sequencing study is part of a larger project that is funded by the National Human Genome Research Institute's Allelic Spectrum in Common Disease Initiative, and comprises sequence data from more than 7000 individuals in two Finnish cohorts: NFBC1966 and the Finland-United States Investigation of NIDDM Genetics (FUSION) study.
Study
phs000276
Identification of genetic etiology of CAMRQ2
We aimed to identify genetic etiology of cerebellar ataxia, mental retardation, and disequilibrium syndrome (CAMRQ). Targeted sequencing of the entire CAMRQ2 locus, 7.1 Mb interval on chromosome 17p13, in three affected individuals and two obligate carriers uncovered the mutation associated with CAMRQ2.
Study
EGAS00000000099
Analysis of recent shared ancestry in a familial cohort identifies coding and noncoding autism spectrum disorder variants
Autism spectrum disorder (ASD) is a collection of neurodevelopmental disorders characterized by deficits in social communication and restricted, repetitive patterns of behavior or interests. ASD is highly heritable, but genetically and phenotypically heterogeneous, reducing the power to identify causative genes. We performed whole genome sequencing (WGS) in an ASD cohort of 68 individuals from 22 families enriched for recent shared ancestry. We identified an average of 3.07 million variants per genome, of which an average of 112,512 were rare. We mapped runs of homozygosity (ROHs) in affected individuals and found an average genomic homozygosity of 9.65%, consistent with expectations for multiple generations of consanguineous unions. We identified potentially pathogenic rare exonic or splice site variants in 12 known (including KMT2C, SCN1A, SPTBN1, SYNE1, ZNF292) and 12 candidate (including CHD5, GRB10, PPP1R13B) ASD genes. Furthermore, we annotated noncoding variants in ROHs with brain-specific regulatory elements and identified putative disease-causing variants within brain-specific promoters and enhancers for 5 known ASD and neurodevelopmental disease genes (ACTG1, AUTS2, CTNND2, CNTNAP4, SPTBN4). We also identified copy number variants in two known ASD and neurodevelopmental disease loci in two affected individuals. In total we identified potentially etiological variants in known ASD or neurodevelopmental disease genes for ~61% (14/23) of affected individuals. We combined WGS with homozygosity mapping and regulatory element annotations to identify candidate ASD variants. Our analyses add to the growing number of ASD genes and variants and emphasize the importance of leveraging recent shared ancestry to map disease variants in complex neurodevelopmental disorders.
Study
EGAS00001006058
Genetic Study of Inflammatory Bowel Disease (IBD) in African Americans
Association studies have previously identified 200 genome-wide significant (GWS) IBD susceptibility loci in European ancestry populations. At least thirty-five loci have been identified in Asians and a handful appear Asian specific. AAs have a higher risk for developing disease complications in IBD and worse disease outcome, suggesting that significant differences in pathological and molecular mechanisms may exist in AAs. It was therefore imperative to use more comprehensive genotyping platforms in more highly powered samples to detect AA-specific IBD loci and associations. We conducted two independent GWAS using a total of 2345 AA IBD cases and 5002 controls population from unrelated, self-identified AAs individuals. GWAS1 included 1258 IBD cases (843 CD, 368 UC, 47 IBD-U) and 1678 controls derived from the dbGaP Health and Retirement Study (phs000428). GWAS 2 included 1087 IBD cases (803 CD, 215 UC, 69 IBD-U) and 3324 controls obtained from the Kaiser RPGEH study (phs000788).
Study
phs001571
Myelodysplastic_syndrome_whole_genomes
Wholegenome libraries will be prepared from at least two serial samples reflecting different stages of disease progression and matched constitutional DNA for 30 Myelodysplastic syndrome patient samples. Five lanes of Illumina HiSeq sequencing will be performed on each of the tumour samples and four lanes for each of the constitutional DNA. Sequencing data will mapped to build 37 of the human reference genome and analysis will be performed to characterize the spectrum of somatic variation present in these samples including single base pair mutations, insertions, deletions as well as larger structural variants and genomic rearrangements.
Study
EGAS00001000291
Myeloproliferative_Disease_Whole_Genomes
Wholegenome libraries will be prepared from at least two serial samples reflecting different stages of disease progression and matched constitutional DNA for 30 Myeloproliferative Disease samples. Five lanes of Illumina HiSeq sequencing will be performed on each of the tumour samples and four lanes for each of the constitutional DNA. Sequencing data will mapped to build 37 of the human reference genome and analysis will be performed to characterize the spectrum of somatic variation present in these samples including single base pair mutations, insertions, deletions as well as larger structural variants and genomic rearrangements.
Study
EGAS00001000290
The_genetics_of_thinness_compared_to_obesity
The variation in weight within a shared environment is largely attributable to genetic factors. Whilst many genes/loci confer susceptibility to obesity, little is known about the genetic architecture of thinness. In this study we performed a genome-wide association study of 1,622 persistently thin healthy individuals (STILTS), 1,985 severe childhood onset obesity cases (SCOOP) and 10,433 population based individuals (UKHLS) used as a common set of controls. All participants were genotyped on the Illumina Core Exome array, including 551,839 markers and imputed to the combined UK10K and 1000G (phase3) reference panel. We contrast the genetic architecture of thinness with that of severe early onset obesity and explore whether the genetic loci influencing thinness are the same as those influencing obesity pr whether there are important genetic differences between them.
This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing
Study
EGAS00001002624
Health and Retirement Study (HRS)
Introduction to V2: This data release comprises data from the V1 release combined with approximately 3,000 additional samples, collected during the HRS 2010 field period. The 2010 data include samples from a random half of the new cohort enrolled in 2010 along with a significant expansion of the minority sample. Description: The University of Michigan Health and Retirement Study (HRS) is a longitudinal panel study that surveys a representative sample of approximately 20,000 people in America over the age of 50 every two years. Supported by the National Institute on Aging (NIA U01AG009740) and the Social Security Administration, the HRS explores the changes in labor force participation and the health transitions that individuals undergo toward the end of their work lives and in the years that follow. The study collects information about income, work, assets, pension plans, health insurance, disability, physical health and functioning, cognitive functioning, and health care expenditures. Through its unique and in-depth interviews, the HRS provides an invaluable and growing body of multidisciplinary data that researchers can use to address important questions about the challenges and opportunities of aging. Because of its innovation and importance, the HRS has become the model and hub for a growing network of harmonized longitudinal aging studies around the world. Origins of the HRS. As the population ages it is increasingly important to obtain reliable data about aging and topics that are relevant to a range of policy issues in aging. To address this need, the National Institutes on Aging (NIA) established a cooperative agreement with the University of Michigan Institute for Social Research to collect such data. The HRS launched data collection in 1992 and has re-interviewed the original sample of respondents every two years since then. By adding new cohorts and refreshing the sample, the HRS has grown to become the largest, most representative longitudinal panel study of Americans 50 years and older. HRS Study Design. The target population for the original HRS cohort includes all adults in the contiguous United States born during the years 1931-1941 who reside in households, with a 2:1 oversample of African-American and Hispanic populations. The original sample is refreshed with new birth cohorts (51-56 years of age) every six years. The sample has been expanded over the years to include a broader range of birth cohorts as well. The target population for the AHEAD survey consists of United States household residents who were born in 1923 or earlier. Children of the Depression (CODA) recruits households born 1924-1930, War Babies 1942-47, Early Boomers 1948-53, and Mid-Boomers 1954-59. Data collection includes a mixed mode design combining in-person, telephone, mail, and Internet. For consenting respondents, HRS data are linked at the individual level to administrative records from Social Security and Medicare claims. Genetic Research in the HRS. The HRS has genotyped 2.5 million single nucleotide polymorphisms (SNPs) on respondents using Illumina's Human Omni2.5-Quad (Omni2.5) BeadChip. The genotyping was performed by the NIH Center for Inherited Disease Research (CIDR). Saliva was collected on half of the HRS sample each wave starting in 2006. In 2006, saliva was collected using a mouthwash collection method. From 2008 onward, the data collection method switched to the Oragene kit. Saliva completion rates were 83% in 2006, 84% in 2008, and 80% in 2010 among new cohort enrollees. HRS Phenotypic data. Phenotypic data are available on a variety of dimensions. Health measures include physical/psychological self-report, various health conditions, disabilities, cognitive performance, health behaviors (smoking, drinking, exercise), physical performance and anthropomorphic measures, and biomarkers (HbA1c, Total Cholesterol, HDL, CRP, Cystatin-C). Data are also available on health services including utilization, insurance and out-of-pocket spending with linkage to Medicare records. Economic measures include employment status/history, earnings, disability, retirement, type of work, income by source, wealth by asset type, capital gains/debt, consumption, linkage to pensions, Social Security earnings/benefit histories. There is also extensive information on family structure, proximity, transfers to/from of money, time, social and psychological characteristics, as well as a wide range of demographics. Performance on a cognitive test combining immediate and delayed word recall was selected as an example trait for the dbGaP data release. In the immediate word recall task the interviewer reads a list of 10 nouns to the respondent and asks the respondent to recall as many words as possible from the list in any order. After approximately five minutes of asking other survey questions, the respondent is asked to recall the nouns previously presented as part of the immediate recall task. The total recall score is the sum of the correct answers to these two tasks, with a range of 0 to 20. Researchers who wish to link to other HRS measures not in dbGaP will be able to apply for access from HRS. A separate Data Use Agreement (DUA) will be required for linkage to the HRS data. See the HRS website (http://hrsonline.isr.umich.edu/gwas) for details.
Study
phs000428
Observational study on the immune responses acquired by COVID-19 convalescent individuals
We performed whole exome sequencing and microarray genotyping analysis of rare disease A two patients. .....
Study
JGAS000563
Single_Cell_Sequencing_of_Sperm__scSperm_
The aim of this project is to genotype and sequence single spermatozoa from two men, one in his twenties and the other in his seventies. The resulting data is used to quantify the mutations that have arisen in the gametes of both individuals in order to better understand the effect of aging on mutation rates and modes.Project Outline. In order to quantify mutations, semen from two individuals are sequenced. 48 single sperm cells are isolated from each individual, and their DNA is extracted. The resulting genomes are amplified using PicoPlex, GenomiPhi MDA, Repli-G MDA, and MALBAC. QC step is applied to check the quality of WGA DNA using standard Sequenom plex (26 SNPs). A subset of 32 amplification products which pass the intiall QC, are genotyped using Affymetrix SNP6 chips. 12 of the genotyped amplification products are also sequenced. In addition, one multi-cell sample per individual is sequenced as a reference and for validation purposes.Altogether, 12 single cell sperm genomes and two multi-cell genomes are sequenced, coming to a total of 14 genomes. Of the single cell sperm genomes, 2 are sequenced to 50x coverage, and the other 10 to 25x coverage. Both multi-cell genomes are sequenced to 25x coverage.
Study
EGAS00001000935
RTEL1 mutation as a modifier of Dyskeratosis Congenita in a family with a Telomerase RNA (hTR) template mutation and variant telomeric repeats
Vertebrate telomeres, the sequences protecting the end of linear chromosomes, are composed of conserved hexameric GGTTAG repeats. Here we present a C50>A telomerase RNA template mutation that results in the incorporation of the variant telomeric repeat GTTTAG in a family with dyskeratosis congenita primarily presenting as idiopathic pulmonary fibrosis. The mutant telomerase is characterized by decreased processivity in direct telomerase activity assays and in vivo based on data from Illumina next-generation whole genome sequencing and Oxford Nanopore Technologies long-read telomere sequencing. The latter provided positional mutant repeat information that revealed the inheritance and maintenance of proximal mutant repeats in individuals who did not inherit the template mutation. Additionally, we describe an RTEL1 nonsense mutation that is associated with very short telomeres in progeny harboring wild-type telomerase and inherited mutant telomeric repeats in the absence of overt clinical phenotypes. In contrast, co-inheritance of the RTEL1 nonsense and TERC r.C50>A mutations coincided with severe early-onset DC phenotypes in two individuals.
Study
EGAS50000001644
Two lung cancer cell lines with EGFR mutations, PC-9 and KHM-3S, were either treated with Tarceva for 24 hours or left untreated. The gene expression profiles were examined by RNAseq, and the genome wide binding profiles of total STAT3 and pSTAT3 were characterized by ChIPseq.
Lung cancers harboring activating EGFR mutants show dramatic responses to EGFR TKIs, such as Tarceva. However, nearly all patients show relapse within 1 year after initial treatment. To investigate the early signaling switches that are involved in the formation of drug resistance, and the role of STAT3 in the early signaling switches involved in the formation of drug resistance, we have treated two lung cancer cell lines with EGFR mutations, PC-9 and KMH-3S, with Tarceva for 24hours. The RNA samples with and without treatment were collected in triplicate for RNAseq experiment.The total STAT3 and pSTAT3 chromatinIP were done on samples with and without treatment for ChIPseq.
Study
EGAS00001000793
Small RNA sequencing of human oocytes and early embryos
This study looks at small RNA expression in human oocytes (n=12), zygotes (n=5), and embryos (n=10). The sample preparation and sequencing were done in two batches.
Study
EGAS50000000157