CADD/GADD centers on Antisocial Drug Dependence
CADD (Center for Antisocial Drug Dependence): Funded through NIDA 011015 to study genetic influences on, and treatment of, antisocial drug dependence, studying both clinical probands and their families, and community samples of matched controls, twins, and participants in an ongoing longitudinal adoption study. A collaboration between three organizations at two campuses of the University of Colorado. Longitudinal with three waves of data collection completed. GADD (Genetics of Adolescent Antisocial Drug Dependence): Funded originally through NIDA 012845, s multisite collaboration including adolescent subjects at high-risk for antisocial drug dependence and their siblings, recruited in Denver, CO and San Diego, CA. Longitudinal with two waves of data collection completed, one in progress as of May, 2018.
Study
phs001841
MOSAIC - Multi-Omics Spatial Atlas In Cancer
MOSAIC is a collaborative initiative founded by Owkin, Lausanne University Hospital (CHUV), Charité Universitätsmedizin Berlin, University Hospital Erlangen (UKER), Gustave Roussy Institute in Paris, and University of Pittsburgh. The goal of MOSAIC is to build the largest collection of spatial omics data in cancer. By integrating comprehensive high quality clinical annotations with advanced deep profiling techniques, MOSAIC aims to uncover novel cancer subtypes and identify key drug targets and biomarkers within them.
Study
EGAS50000000689
The Thrifty Microbiome: The Role of the Gut Microbiota in Obesity in the Amish
Emerging evidence that the gut microbiota may contribute in important ways to human health and disease has led us and others to hypothesize that both symbiotic and pathological relationships between gut microbes and their host may be key contributors to obesity and the metabolic complications of obesity. Our "Thrifty Microbiome Hypothesis" poses that gut microbiota play a key role in human energy homeostasis. Specifically, constituents of the gut microbial community may introduce a survival advantage to its host in times of nutrient scarcity, promoting positive energy balance by increasing efficiency of nutrient absorption and improving metabolic efficiency and energy storage. However, in the presence of excess nutrients, fat accretion and obesity may result, and in genetically predisposed individuals, increased fat mass may result in preferential abdominal obesity, ectopic fat deposition (liver, muscle), and metabolic complications of obesity (insulin resistance, hypertension, hyperlipidemia). Furthermore, in the presence of excess nutrients, a pathological transition of the gut microbial community may occur, causing leakage of bacterial products into the intestinal lymphatics and portal circulation, thereby inducing an inflammatory state, further aggravating metabolic syndrome traits and accelerating atherosclerosis. This pathological transition and the extent to which antimicrobial leakage occurs and causes inflammatory and other maladaptive sequelae of obesity may also be influenced by host factors, including genetics. In the proposed study, we will directly test the Thrifty Microbiome Hypothesis by performing detailed genomic and functional assessment of gut microbial communities in intensively phenotyped and genotyped human subjects before and after intentional manipulation of the gut microbiome. To address these hypotheses, five specific aims are proposed: (1) enroll three age- and sex-matched groups from the Old Order Amish: (i) 50 obese subjects (BMI > 30 kg/m2) with metabolic syndrome, (ii) 50 obese subjects (BMI > 30 kg/m2) without metabolic syndrome, and (iii) 50 non-obese subjects (BMI < 25 kg/m2) without metabolic syndrome and characterize the architecture of the gut microbiota from the subjects enrolled in this study by high-throughput sequencing of 16S rRNA genes; (2) characterize the gene content (metagenome) to assess the metabolic potential of the gut microbiota in 75 subjects to determine whether particular genes or pathways are correlated with disease phenotype; (3) characterize the transcriptome in 75 subjects to determine whether differences in gene expression in the gut microbiota are correlated with disease phenotype, (4) determine the effect of manipulation of the gut microbiota with antibiotics on energy homeostasis, inflammation markers, and metabolic syndrome traits in 50 obese subjects with metabolic syndrome and (5) study the relationship between gut microbiota and metabolic and cardiovascular disease traits, weight change, and host genomics in 1,000 Amish already characterized for these traits and in whom 500K Affymetrix SNP chips have already been completed. These studies will provide our deepest understanding to date of the role of gut microbes in terms of 'who's there?', 'what are they doing?', and 'how are they influencing host energy homeostasis, obesity and its metabolic complications? PUBLIC HEALTH RELEVANCE: This study aims to unravel the contribution of the bacteria that normally inhabit the human gastrointestinal tract to the development of obesity, and its more severe metabolic consequences including cardiovascular disease, insulin resistance and Type II diabetes. We will take a multidisciplinary approach to study changes in the structure and function of gut microbial communities in three sets of Old Order Amish patients from Lancaster, Pennsylvania: obese patients, obese patients with metabolic syndrome and non-obese individuals. The Old Order Amish are a genetically closed homogeneous Caucasian population of Central European ancestry ideal for genetic studies. These works have the potential to provide new mechanistic insights into the role of gut microflora in obesity and metabolic syndrome, a disease that is responsible for significant morbidity in the adult population, and may ultimately lead to novel approaches for prevention and treatment of this disorder.
Study
phs000258
The National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) Consortium for the Longitudinal Evaluation of African-Americans with Early Rheumatoid Arthritis (CLEAR)
The Consortium for the Longitudinal Evaluation of African-Americans with Early Rheumatoid Arthritis (CLEAR) Registry and Repository, supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS), was established in 2000 to provide a resource for the scientific community to explore genetic and non-genetic factors affecting rheumatoid arthritis (RA) occurrence and outcomes in African Americans. The long term objective is a database and a repository of 1,100 RA and 550 matched healthy African-American subjects. This CLEAR Registry and Repository has two arms: a longitudinal arm for subjects with early RA (enrollment from 2000 to 2005) and a cross-sectional arm for subjects with any disease duration (enrollment from 2006 to 2012). CLEAR has two components: a database and a repository. The database contains extensive demographic, socioeconomic, clinical and radiographic (radiographs of hands and feet) information and bone mineral density data from DEXA scans. The repository contains genomic DNA, plasma and serum on most of the participants. Participants in CLEAR II had RNA isolated from peripheral blood cells.
Study
phs001360
300-Obese: clinical cohort of obese individuals, Nijmegen, the Netherlands
300-Obese cohort was recruited at the Radboud University Medical Center (RUMC), Nijmegen, the Netherlands. The cohort comprises 377 participants included by the following criteria: age>55 years, BMI>27 kg/m2. The cohort data includes gut microbiome, NMR serum metabolomics, deep cardiovascular phenotyping and broad range of phenotypic information.
Study
EGAS00001003508
Phylogenetic Analyses of Melanoma Reveal Complex Patterns of Metastatic Dissemination
Subpopulations of cells in a primary melanoma can disseminate and establish metastases. Still, the precise ancestral relationship between primary tumors and their metastases is not well understood. Using whole-exome sequencing (for discovery) and targeted sequencing (for validation), we analyzed mutation patterns of primary melanomas and two or more metastases in each of 8 patients to determine their phylogenetic relationships, profiling a total of 31 total tumors. The resulting data show that in 6 of 8 cases, genetically unique cell populations in the primary metastasized in parallel to distinct anatomic sites, rather than sequentially. These data also indicate that individual metastases were sometimes founded by multiple cell populations of the primary that were genetically distinct.
Study
phs000941
Human liver NPCs single cell project
Independent of their inflammatory phenotype, macrophages are key orchestrators of hepatic metabolism. Non-alcoholic fatty liver disease (NAFLD) often occurs in obese individuals and is among the most common causes of cirrhosis, the terminal chronic liver disease that may necessitate liver transplantation. While multiple populations of macrophages have been described in the human liver, their function and turnover in obese patients at high risk of developing NAFLD and cirrhosis is currently unknown. Herein we identified a specific human population of resident liver myeloid cells that protects against the metabolic impairment associated with obesity. By studying the turnover of liver myeloid cells in individuals undergoing liver transplantation using markers of donor-recipient mismatch, we made the novel discovery that liver myeloid cell turnover differs between humans and mice. Using single cell techniques and flow cytometry we determined that the proportion of the protective resident liver myeloid cells, denoted liver myeloid cells 2 (LM2), decreases during obesity. Functional validation approaches using human 2D and 3D cultures revealed that the presence of LM2 ameliorates the oxidative stress associated with obese conditions. Our study indicates that resident myeloid cells could be a therapeutic target to decrease the oxidative stress associated with NAFLD.
Study
EGAS00001007194
UK10K_NEURO_ASD_FI
In the UK10K project we propose a series of complementary genetic approaches to find new low frequency/rare variants contributing to disease phenotypes. These will be based on obtaining the genome wide sequence of 4000 samples from the TwinsUK and ALSPAC cohorts (at 6x sequence coverage), and the exome sequence (protein coding regions and related conserved sequence) of 6000 samples selected for extreme phenotypes. Our studies will focus primarily on cardiovascular-related quantitative traits, obesity and related metabolic traits, neurodevelopmental disorders and a limited number of extreme clinical phenotypes that will provide proof-of-concept for future familial trait sequencing. We will analyse directly quantitative traits in the cohorts and the selected traits in the extreme samples, and also use imputation down to 0.1% allele frequency to extend the analyses to further sample sets with genome wide genotype data. In each case we will investigate indels and larger structural variants as well as SNPs, and use statistical methods that combine rare variants in a locus or pathway as well as single-variant approaches.These samples are a subset of a nationwide collection of Finnish autism spectrum disorder (ASD) samples. The samples have been collected from Central Hospitals across Finland in collaboration with the University of Helsinki. The samples consist of 93 individuals with a diagnosis of autistic disorder or Asperger syndrome from 36 families with at least two affected individuals. Of these individuals, 16 can be genealogically connected to form two large pedigrees originating from Central Finland, suggesting possible genetic risk factors shared identical by descent within the pedigrees. All diagnoses are based on ICD-10 and DSM-IV diagnostic criteria for ASDs. Additional phenotypic data is available for a subset of the individuals.For further information with regard to this cohort please contact Aarno Palotie (Aarno.palotie@helsinki.fi).
Study
EGAS00001000110
This dataset contains fastq and BAM data from female adipose tissue.
Here we have from 64 samples, their corresponding fastq and bam files.
The study group consisted of 17 obese women with normal glucose tolerance and 15 obese women with T2DM classified according to WHO standards. The groups were matched for age, BMI and waist circumference. All the women had been morbidly obese (BMI>40 kg/m2) for at least five years.
Dataset
EGAD00001002202
MARS-seq dataset of five obese human subjects and a lean human subject
Biopsies from visceral adipose tissue from the omental depot (OAT) were obtained from five obese individuals and one lean donor with participant informed consent obtained after the nature and possible consequences of the studies were explained under protocols approved by the Institutional Review Boards of the Perelman School of Medicine at the University of Pennsylvania, the Children’s Hospital of Philadelphia, or the Tel Aviv Sourasky Medical Center. The obese donors underwent bariatric surgery, the lean donor underwent cholecystectomy. OAT samples were placed in 1 mL of DMEM, and finely minced under sterile conditions before digestion in 50 mL of DMEM with 3 mg/1 mL collagenase IV (Gibco). Samples were incubated at 37°C in a rotating oven for 20-60 min. Adipocyte and stromal vascular fractions (SVF) were separated by centrifugation, and red blood cells (RBCs) were removed from the SVF by histopaque gradient (Sigma). Single-cell RNA-sequencing libraries were prepared using the MARS-seq pipeline, and sequenced on the MiSeq 500 or HiSeq 2500 Sequencing System (Illumina).
Dataset
EGAD00001005100
NHLBI GO-ESP: Early-Onset Myocardial Infarction (Broad EOMI)
The NHLBI "Grand Opportunity" Exome Sequencing Project (GO-ESP), a signature project of the NHLBI Recovery Act investment, was designed to identify genetic variants in coding regions (exons) of the human genome (the "exome") that are associated with heart, lung and blood diseases. These and related diseases that are of high impact to public health and individuals from diverse racial and ethnic groups will be studied. These data may help researchers understand the causes of disease, contributing to better ways to prevent, diagnose, and treat diseases, as well as determine whether to tailor prevention and treatments to specific populations. This could lead to more effective treatments and reduce the likelihood of side effects. GO-ESP is comprised of five collaborative components: 3 cohort consortia - HeartGO, LungGO, and WHISP - and 2 sequencing centers - BroadGO and SeattleGO. In the Grand Opportunities Exome Sequencing Program Early MI Project (GO ESP - EOMI), we are sequencing cases with extremely early-onset MI drawn from 8 cohorts. These cohorts include five hospital or community-based studies that ascertained individuals based on MI status. These include PennCATH, Cleveland Clinic Genebank, Massachusetts General Hospital Premature Coronary Artery Disease Study (MGH-PCAD), Heart Attack Risk in Puget Sound (HARPS), and Translational Research Investigating Underlying Disparities in Myocardial Infarction Patients' Health Status (TRIUMPH). Cases were selected based on MI occurring in men aged ≤50 years and women aged ≤60 years. In addition, early-MI cases are being drawn from three population-cohort studies including the Framingham Heart Study, the Women's Health Initiative, and the Atherosclerosis Risk in Communities Study. MI-free controls are being drawn from five population-based cohort studies including the Framingham Heart Study, the Women's Health Initiative, Atherosclerosis Risk in Communities Study, Cardiovascular Health Study, and the Jackson Heart Study. Controls were selected based on two factors: (1) highest predicted risk for MI based on Framingham risk score; and (2) absence of prevalent or incident MI despite a high predicted risk.
Study
phs000279
National Institute on Aging - Late Onset Alzheimer's Disease Family Study: Genome-Wide Association Study for Susceptibility Loci
Alzheimer disease is the most common neurodegenerative disorder of the elderly affecting an estimated five million Americans. Genetic factors contribute to the risk for disease with heritability estimates ranging from 57% to 79%. More than a decade ago, the ε4 variant of APOE was identified and remains the most consistently replicated genetic variant influencing the risk of late onset Alzheimer disease. A segregation analysis suggests there may be four additional genes influencing the age-at-onset of Alzheimer disease. In 2007 there were 968 association studies in 398 candidate genes reported, but none replicated consistently. There are many reasons for the lack of consistency, but one important reason for the lack of progress is the paucity of a sufficient number of well characterized families and patients available to the entire scientific community. The extensive effort and expense required to ascertain such a population has been addressed by the NIA-LOAD Family Study. Its goal is to identify and recruit families with two or more siblings with the late-onset form of Alzheimer's disease and a cohort of unrelated, non-demented controls similar in age and ethnic background, and to make the samples, the clinical and genotyping data and preliminary analyses available to qualified investigators world-wide. Genotyping by the Center for Inherited Disease Research (CIDR) was performed using the Illumina Infinium II assay protocol with hybridization to Illumina Human 610Quadv1_B Beadchips. This genotyping represents the largest collection of families ever assembled with Alzheimer's disease combining the NIA-LOAD Genetics Initiative Multiplex Family Study, the National Cell Repository for Alzheimer's Disease (NCRAD) with additional controls from the University of Kentucky. These genotyping results will serve as a focal point for future research that will identify all of the remaining genetic variants in Alzheimer's disease.
Study
phs000168
SureTypeSC - accurate genotyping of single-cell SNP array data
We used a collection of single cells from two Coriell cell lines to train and validate a machine learning method for quality assessment of the single cell genotypes
Study
EGAS00001004621
National Heart, Lung, and Blood Institute (NHLBI) Heart Healthy Lenoir (HHL) Genomics Study
The HHL genomics study uses a systems approach to develop models integrating clinical and genomic data. Previously we developed and tested an approach known as the SAMARA (Supporting A Multidisciplinary Approach to Research in Atherosclerosis) project that applied recent advances in biomedical and computational sciences at The University of North Carolina at Chapel Hill to develop a deeper understanding of human cardiovascular disease (CVD). The Heart-Healthy Lenoir Project expands these studies into the community, using this methodology to: 1) determine the prevalence of genomic risk signatures in high-risk community populations using genome-wide Single Nucleotide Polymorphism (SNP) analysis; 2) develop novel genomic models incorporating high-risk features in this population; and 3) determine whether genomic signatures can be used to predict responsiveness to interventions that underlie CVD disparities. DNA was obtained from participants enrolled in two of the HHL clinical trials, 1) Improving Care for Patients With High Blood Pressure (NCT01425515) or 2) Heart-Healthy Lenoir Lifestyle Study (NCT01433484). Participants could enroll in both trials concurrently.
Study
phs001471
Esophageal Adenocarcinoma Organoid Genomics
Single cell RNA-seq (scRNA-seq) of esophageal adenocarcinoma organoids to benchmark variant calling from 10X Genomics scRNA-seq data. Five EAC organoids were subjected to scRNA-seq and two included matched exome data.
Study
EGAS00001005224
National Institute of Neurological Disorders and Stroke (NINDS), Family Study of Essential Tremor (FASET), Identification of Susceptibility Genes for Essential Tremor
The Familial Study of Essential Tremor (FASET) was designed to identify susceptibility genes for Essential Tremor. ET is among the most common neurological diseases with a prevalence (age > 40 years) estimated to be 4.0% and prevalence in advanced age (> 90 years) exceeding 20%. ET, often referred to as "familial tremor", is generally regarded as a highly genetic disorder with families, with affected members over multiple generations, and twin studies show high concordance among monozygotic twins. Probands (affected with ET) and relatives were enrolled in a family study of ET at Columbia University, New York between 2011 - 2014. The study was approved by the Institutional Review Board at Columbia University and written informed consent was obtained from all enrollees. The criteria for enrollment were: 1) the proband had early-onset ET with age at onset < 50 years, 2) the proband had a diagnosis of definite or probable ET, 3) in addition to the proband, there were at least two affected relatives in the family, 4) additional affected and unaffected family members were willing to participate in the study, and 5) the families contained more than two affected individuals in different generations. Blood samples were also collected for genetic research. For the genetic analyses, we excluded enrollees that we or others had diagnosed with Parkinson's disease (PD) or dystonia. The final sample includes 52 families (52 probands [affected with ET]) and 155 relatives). The number of affected individuals enrolled per family ranged from 3 - 7 (mean = 4.1). Genetic samples from FASET were analyzed with whole genome SNP genotyping (for linkage analyses) and whole exome sequencing. It is hoped that this resource will better help researchers to understand the genetic causes of ET and underlying disease pathogenesis.
Study
phs000966
Centers for Common Disease Genomics (CCDG) - Whole Genome Sequencing in Type 1 Diabetes (T1DGC)
The Type 1 Diabetes Genetics Consortium (T1DGC) was established to collect resources (biological samples and data) and conduct research to better understand the genetic basis of type 1 diabetes (T1D). Collection was initiated by ascertaining affected sib-pair families (both parents, two affected siblings and, when available, an unaffected sibling), collected from five geographic regions through four recruitment networks (Asia-Pacific, Europe, North America, United Kingdom). In addition, the T1DGC collected trio families (both parents and affected child) and cases and controls from low-prevalence populations (African-American, with four grandparents self-reporting as African ancestry; Mexican-American, with four grandparents self-reporting as ancestry from Mexico). The T1DGC also served as a repository for contributed collections from other studies, all meeting the broad data-sharing policy of the T1DGC, for inclusion in the genetic studies. These collections include T1D case samples ascertained from the UK Genetic Resource Investigating Diabetes (UK GRID) cohort, SEARCH for Diabetes in Youth (SEARCH), The Genetics of Kidneys in Diabetes (GoKinD), and control samples obtained from the British 1958 Birth Cohort, the UK National Blood Services collection, CLEAR (Consortium for the Longitudinal Evaluation of African-Americans with Early Rheumatoid Arthritis), the New York Cancer Project (NYCP), and other cohorts. For the NHGRI-funded Centers for Common Disease Genomics (CCDG) project, participants with T1D and ancestry-matched controls were identified through the T1DGC, either through direct ascertainment or by contribution from other sources to the T1DGC. As the CCDG has focused initially on non-Caucasian populations for whole genome sequencing, T1DGC participants of African, Mexican and Asian ancestry (targeting ~1200 cases and ~1200 controls in each ancestral group) and a small group of participants of Northern European ancestry (~100 cases, ~100 controls) were to be contributed to the study. Whole genome sequencing of T1DGC samples would be conducted at Washington University McDonnell Genome Institute and based upon matching case-control status within an ancestry group and prioritization by the CCDG.
Study
phs001222
Coronary Artery Risk Development in Young Adults (CARDIA) Study - Gene Environment Association Studies Initiative (GENEVA)
For the GENEVA CARDIA project, three genotype call sets were generated from a single set of array scans as a consequence of DNA sample quality problems. These call sets are designated "Birdsuite-1", "Birdsuite-2" and "Beaglecall". ("Beaglecall" used both Birdseed and BEAGLECALL calling algorithms.) An analysis-ready genotypic data set is provided in PLINK format for the "Beaglecall" set only, because it performs very well in QC analyses. Only raw CHP and ALLELE_SUMMARY files are provided for the two Birdsuite call sets because they have significant quality issues. Use of the Beaglecall set is highly recommended. Users of the other two call sets should proceed with caution. More details are given in the genotypic QC report. The CARDIA study, sponsored by the National Heart, Lung and Blood Institute (NHLBI), is a prospective, multi-center investigation of the natural history and etiology of cardiovascular disease in African-Americans and Whites 18-30 years of age at the time of initial examination. The initial examination included 5,115 participants selectively recruited to represent proportionate racial, gender, age, and education groups from 4 communities: Birmingham, AL; Chicago, IL; Minneapolis, MN; and Oakland, CA. Participants from the Birmingham, Chicago, and Minneapolis centers were recruited from the total community or from selected census tracts. Participants from the Oakland center were randomly recruited from the Kaiser-Permanente health plan membership. From the time of initiation of the study in 1985-1986, five follow-up examinations have been conducted at years 2, 5, 7, 10, 15, and 20. The Year 25 examination is scheduled to begin in 2010. This study is part of the Gene Environment Association Studies initiative (GENEVA, http://www.genevastudy.org) funded by the trans-NIH Genes, Environment, and Health Initiative (GEI). The overarching goal is to identify novel genetic factors associated with variation in longitudinal blood pressure profiles during the critical transition period from young adulthood to early middle-age; and to characterize their interactions with relevant environmental factors, such as body weight profiles. Genotyping was performed at the Broad Institute of MIT and Harvard, a GENEVA genotyping center. Data cleaning and harmonization were performed at the GEI-funded GENEVA Coordinating Center at the University of Washington.
Study
phs000309
Paired exome analysis in urothelial carcinoma
In this study we characterized genomic alterations in two to five metachronous tumors from 29 patients initially diagnosed with early stage disease. Fourteen patients (32 tumors) had non progressive disease (NPD) and 15 patients (34 tumors) had progressive disease (PD). Whole exome sequencing (WES, ~50x mean read depth), Ultra deep targeted sequencing (~6,809x mean read depth) and whole transcriptome RNA-seq was performed for all samples. In addition multiregional WES was performed on 8 adjacent regions from a single tumor.
Study
EGAS00001001686
Transgenerational transmission of reproductive and metabolic dysfunction in the male progeny of polycystic ovary syndrome
The transgenerational maternal effects of PCOS in female progeny have been revealed. As there are evidence that a male equivalent of PCOS may exist, we asked whether sons born to mother with PCOS (PCOS-sons) transmit reproductive and metabolic phenotypes to their male progeny. Here, in a Swedish nationwide register-based cohort and a clinical case-control study from Chile we found that PCOS-sons are more often obese and dyslipidemic. Their serum miRNAs are found to potentially regulate PCOS-risk genes. Our prenatal androgenized PCOS-like mouse model with or without diet-induced obesity confirmed that reproductive and metabolic dysfunctions in F1 male offspring are passed down to F3. Small non-coding RNAs (sncRNAs) sequencing of F1-F3 sperm revealed distinct differentially expressed (DE) sncRNAs across generations in the androgenized, obese, and obese and androgenized lineages, respectively. Notably, common targets between transgenerational DEsncRNAs in mouse sperm and in PCOS-sons serum indicate similar effects of maternal hyperandrogenism. These findings strengthen the translational relevance highlighting a previously underappreciated risk of reproductive and metabolic dysfunction via the male germline transmission and potential molecular markers to study in future generations.
Study
EGAS00001007079
Evaluation of Nuclear DNA from Rootless Hairs for Forensic Purposes
The study aims to overcome current limitations in the recovery of DNA from small, difficult forensic samples. Particularly, our goal is to produce a robust laboratory protocol and the accompanying software to accelerate adoption, as well as to evaluate the reliability and robustness of both the laboratory and computational aspects of generating genotype files from minute and/or degraded DNA samples, such as single, rootless hairs.The data accompanying this study includes raw, paired-end reads from high-throughput sequencing of two panels of saliva, head hair, and pubic hair samples collected from anonymous volunteers at the University of California, Santa Cruz. The smaller set (Hair1.0) comprises 8 individuals, while the larger (Hair2.0) comprises 50 individuals, with 3 overlapping individuals between the two panels identified in post-collection analysis. We did not collect phenotype data or personally identifying information from the participants. For the Hair2.0 panel, only a subset of volunteers provided pubic hairs for DNA extraction and sequencing. Also included are saliva-derived genotype array data for all 8 individuals in the Hair1.0 panel and 44 of 50 individuals in the Hair2.0 panel.
Study
phs002979
Ecological Stressors, PTSD, and Drug Use in Detroit: The Detroit Neighborhood Health Study (DNHS)
The Detroit Neighborhood Health Study (DNHS) is a prospective, representative longitudinal cohort study of predominantly African American adults living in Detroit, Michigan. The overall goal of the DNHS is to identify how genetic variation, lifetime experience of stressful and traumatic events, and features of the neighborhood environment predict psychopathology and behavior. Cohort participants were selected with a dual-frame probability design, using telephone numbers obtained from the U.S. Postal Service Delivery Sequence Files as well as a listed-assisted random-digit-dial frame. Individuals without listed landlines or telephones and individuals with only a cell phone listed were invited to participate through a postal mail effort. Participants completed a 40 minute, structured telephone interview annually between 2008-2012 to assess perceptions of participants' neighborhoods, mental and physical health status, social support, exposure to traumatic events, and alcohol and tobacco use; each participant was compensated $25USD. All survey participants were offered the opportunity to provide a specimen (venipuncture, blood spot, or saliva) for immune and inflammatory marker testing as well as genetic testing of DNA. Participants received an additional $25USD if they elected to give a sample. Informed consent was obtained at the beginning of each interview and again at specimen collection. The Institutional Review Board of the University of Michigan reviewed and approved the study protocol. The DNHS submission to dbGaP includes phenotype data from all five survey waves (n=856), all available GWAS data for participants who completed wave 4 (n=507), and methylation data for wave 1, wave 2, wave 4, and wave 5 participants (n = 456).
Study
phs000560
Characterizing microbiome-directed fibre snacks in gnotobiotic mice and humans
Knowledge of the interrelationships between what we eat and the configurations of our gut microbial communities is providing important insights into how food components that are not directly metabolized by human enzymes are linked to our physiology and health status. Changing food preferences brought about by Westernization that have deleterious health effects1,2, plus rapid population expansion, ongoing challenges to sustainable agriculture, and other forces contributing to increased food insecurity, are catalyzing efforts to identify more nutritious and affordable foods3. The gut microbial community is complex, dynamic, and exhibits considerable intra- and interpersonal variation in its composition and functions. The massive number of potential interactions between its components makes it challenging to define the mechanisms by which food ingredients affect community properties. There is also a paucity of information about the ‘bioactive’ ingredients of foods that influence the fitness and expressed functions of community members. Here, plant fibres, from different sustainable sources and targeting distinct features of obese human gut microbiomes in gnotobiotic mice, were formulated into snack prototypes and used to supplement controlled diets consumed by overweight and obese adults; the results revealed fibre-specific changes in their microbiomes that were linked to changes in their plasma proteomes indicative of altered physiologic state.
Study
EGAS00001005268
TMM whole genome analysis of 4566 Japanese individuals
Tohoku University Tohoku Medical Megabank Organization (ToMMo) and Iwate Tohoku Medical Megabank Organization (IMM) were founded to establish an advanced medical system to foster the reconstruction from the Great East Japan Earthquake. These organizations are developing a biobank that includes medical and genome information for supporting health and welfare in the Tohoku area. In the first stage, the part of our mission was to sequence the 4,000 individuals to construct Japanese whole-genome reference panel.
Study
JGAS000239
Identification of putative multiple myeloma (MM) susceptibility genes
We sought to identify novel MM susceptibility genes using a collection of families with multiple cases of MM/MGUS, including 189 affected individuals from 40 families, and index cases from an additional 88 families, along with 170 early-onset (EO) MM cases (≤ 55 years). We analyzed a total of 347 affected individuals using whole exome (N=321) and whole genome (N=26) sequencing. Samples were identified and collected through nation-wide efforts in France, Sweden and Greece. We focused on rare (MAF<0.5%) germline protein truncating and likely deleterious missense variants in genes harboring variants in at least two families showing variant-disease segregation, and in additional index (≥2) and/or early-onset (≥2) cases.
Study
EGAS50000001259
Collection: Hispanic Community Health Study/Study of Latinos (HCHS/SOL)
The Hispanic Community Health Study / Study of Latinos (HCHS/SOL) is a multi-center epidemiologic study in Hispanic/Latino populations to determine the role of acculturation in the prevalence and development of disease, and to identify risk factors playing a protective or harmful role in Hispanics/Latinos. The study is sponsored by the National Heart, Lung, and Blood Institute (NHLBI) and six other institutes, centers, and offices of the National Institutes of Health (NIH). The goals of the HCHS/SOL include studying the prevalence and development of disease in Hispanics/Latinos, including the role of acculturation, and identifying disease risk factors that play protective or harmful roles in Hispanics/Latinos. A total of 16,415 persons of Cuban, Dominican, Mexican, Puerto Rican, Central American, and South American backgrounds were recruited through four Field Centers affiliated with San Diego State University, Northwestern University in Chicago, Albert Einstein College of Medicine in the Bronx area of New York, and the University of Miami. Seven additional academic centers serve as scientific and logistical support centers. Study participants aged 18-74 years took part in an extensive clinic exam and assessments to ascertain socio-demographic, cultural, environmental and biomedical characteristics. Annual follow-up interviews are conducted to determine a range of health outcomes.To request access to this collection, select phs003650 in the dbGaP when submitting a data access request.
Study
phs003650
National Cancer Institute (NCI) Study of Lung Cancer and Smoking Phenotypes in African-American Cases and Controls
This is a two-stage case-control study designed to evaluate the association between common genetic variants and the risk of lung cancer. The stage 1 studies included 1737 cases and 3602 controls from the following studies: MD Anderson Lung Cancer Epidemiology Study, The Multiethnic Cohort Study (MEC), NCI-MD Lung Cancer-Case Control Study, Northern California Lung Cancer Study, Project CHURCH (Creating a Higher Understanding of Cancer Research δ Community Health), Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO), Southern Community Cohort Study (SCCS), and the Karmanos Cancer Institute at Wayne State University (KCI/WSU). The stage 2 studies included an independent set of 866 cases and 796 controls from the following studies: The Black Women's Health Study (BWHS), The Harvard-MGH Lung Cancer Susceptibility Study (HLCS), MD Anderson Lung Cancer Epidemiology Study, MD Anderson/LBJ Hospital Biorepository, NCI-MD Lung Cancer Case-Control Study, Northern California Lung Cancer Study, Philadelphia Lung Cancer Study on Gene Environment Interactions (Plus-Gene), Southern Community Cohort Study (SCCS), and KCI/WSU.
Study
phs001210
NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry (VU_AF)
The Vanderbilt Atrial Fibrillation (AF) Registry was founded in 2001. Patients with AF and family members are prospectively enrolled. At enrollment a detailed past medical history is obtained along with an AF symptom severity assessment. Blood samples are obtained for DNA extraction. Patients are followed longitudinally along with serial collection of AF symptom severity assessments.
Study
phs001032
DNA methylation repeatability in the Lothian Birth Cohorts of 1921 and 1936.
The repeatability of longitudinal measures of whole blood DNA methylation (obtained using the Illumina 450k chip) was assessed in two cohorts of ageing. The Lothian Birth Cohort of 1921 and the Lothian Birth Cohort of 1936. Data were collected at ages 70, 73, and 76 (LBC1936) and 79, 87, 90 (LBC1921) with 478 participants having two or more measures of methylation.
Study
EGAS00001000910
X chromosomal genetic variants are associated with childhood obesity
Current genetic association studies are usually focused on autosomal variants only, and the sex chromosomes are often neglected. In recent years, a number of statistical techniques and strategies have been widely described making much easier overcoming X-chromosome technical hurdles and including this region within genetic studies. Tenomodulin (TNMD) is a Xq22 chromosome anti angiogenic locus which has been recently linked to different obesity-related phenotypes. These results have not been replicated to date. Given these facts, we have conducted a genetic association analysis in Spanish children population including seven TNMD SNPs as potential candidate markers for obesity and metabolic dysfunctions. Additionally genotypes for another locus located in the X chromosome, the SLC6A14, have been included in the dataset.
A total of 915 DNA samples from 258 normal weight, 177 overweight and 480 obese Spanish children (438 males, 477
females) were genotyped for seven TNMD SNPs and one SLC6A14 SNP. Associations with anthropometric measurements and glucose metabolism were
investigated.
Study
EGAS00001002738
UK10K_OBESITY_SCOOP
In the UK10K project we propose a series of complementary genetic approaches to find new low frequency/rare variants contributing to disease phenotypes. These will be based on obtaining the genome wide sequence of 4000 samples from the TwinsUK and ALSPAC cohorts (at 6x sequence coverage), and the exome sequence (protein coding regions and related conserved sequence) of 6000 samples selected for extreme phenotypes. Our studies will focus primarily on cardiovascular-related quantitative traits, obesity and related metabolic traits, neurodevelopmental disorders and a limited number of extreme clinical phenotypes that will provide proof-of-concept for future familial trait sequencing. We will analyse directly quantitative traits in the cohorts and the selected traits in the extreme samples, and also use imputation down to 0.1% allele frequency to extend the analyses to further sample sets with genome wide genotype data. In each case we will investigate indels and larger structural variants as well as SNPs, and use statistical methods that combine rare variants in a locus or pathway as well as single-variant approaches.The SCOOP samples are part of the Obesity group and will undergo exome sequencing. Severe Childhood Onset Obesity Project (SCOOP) is a sub-cohort of the Genetics Of Obesity Study (GOOS) cohort established by Sadaf Farooqi and Steve O’Rahilly at the University of Cambridge over the last 12 years. The GOOS cohort contains >4,000 patients of diverse geographic origin, many of whom have monogenic and syndromic forms of obesity, and includes patients that are offspring of consanguineous union. SCOOP is a subset of >1500 UK Caucasian patients with severe, early onset obesity (all patients have a BMI Standard Deviation Score (SDS) > 3 and obesity onset before the age of 10 years), in whom all known monogenic causes of obesity have been excluded. GWAS data on the SCOOP cohort will be available (WTCCC2 independent study) at the time of the start of this study. Data from this cohort has demonstrated that the prevalence of the common obesity risk alleles in FTO, MC4R and NEGR1 are amongst the highest within SCOOP, demonstrating its value also in the study of genetic variants with an impact on more common obesity. For further information with regard to this cohort please contact Sadaf Farooqi (isf20@medschl.cam.ac.uk).
Study
EGAS00001000124
HeartShare - Extant Datasets - Harmonized Clinical Trials Collection
Heart failure with preserved ejection fraction (HFpEF) prevalence is rising with an aging population and contributing comorbidities, such as obesity and diabetes. The Get With The Guidelines - Heart Failure (GWTG-HF) study showed that the proportion of hospitalized HFpEF patients rose from 2005 to 2010, and this trajectory continued to increase through 2020. HFpEF patients have a poor 5-year outcome with a 75% mortality rate. Taken together, these data suggest that HFpEF may be the dominant HF subtype in the future, affecting approximately 1 in 10 adults during their lifetime.In addition to its increasing prevalence, HFpEF is a heterogeneous syndrome with multiple pathophysiological processes and varied clinical presentations. There is an urgent need to improve understanding of different HFpEF phenotypes to match patients with interventions based on their unique subtype. Phenomapping studies have identified between two and six HFpEF clusters/phenogroups with shared features. Of these, three overlapping phenotypes repeatedly arise: an 'older, vascular aging' phenotype, 'metabolic, obese' phenotype, and 'relatively younger, natriuretic peptide (NP) deficiency' phenotype. Continued refinement of these HFpEF subtypes is needed to identify distinct characteristics and underlying biologic and molecular mechanisms towards therapeutic development and future precision trials with improved diagnostic criteria and treatment including prevention.The HeartShare extant datasets collection consists of selected completed trials on HFpEF which have been harmonized for the purpose identifying and defining subtypes. The goal is to identify HFpEF subtypes that have a common pathophysiology. Through the HeartShare extant datasets collection, users can request access to both the harmonized data and the individual trial datasets.To request access to this study collection, select phs003989 in the dbGaP when submitting a data access request. The following clinical trials and harmonized data is available through this collection:Study NameHandleAccessionConsent GroupsSurgical Treatment for Ischemic Heart FailureBioLINCC_STICHphs003493.v1.p1GRUHeart Failure Network Aldosterone Targeted Neurohormonal Combined with Natriuresis TherapyBioLINCC_HFN_ATHENAphs003506.v1.p1GRUHeart Failure Network - Effectiveness of Ultrafiltration in Treating People With Acute Decompensated Heart Failure and Cardiorenal SyndromeBioLINCC_HFN_CARRESSphs003510.v1.p1GRUHeart Failure Network: Diuretic Optimization Strategies Evaluation in Acute Heart FailureBioLINCC_HFN_DOSEphs003524.v1.p1GRUHeart Failure Network - Xanthine Oxidase Inhibition for Hyperuricemic Heart Failure PatientsBioLINCC_EXACT_HFphs003533.v1.p1GRUHeart Failure Network: Functional Impact of GLP-1 for Heart Failure TreatmentBioLINCC_HFN_FIGHTphs003542.v1.p1GRUHeart Failure Network - Nitrate's Effect on Activity Tolerance in Heart Failure with Preserved Ejection FractionBioLINCC_HFN_NEATphs003548.v1.p1GRUHeart Failure Network - Phosphodiesterase-5 Inhibition to Improve Clinical Status and Exercise Capacity in Diastolic Heart FailureBioLINCC_HFN_RELAXphs003565.v1.p1GRUHeart Failure Network - Renal Optimization Strategies Evaluation in Acute Heart Failure and Reliable Evaluation of DyspneaBioLINCC_HFN_ROSEphs003589.v1.p1GRUHeart Failure: A Controlled Trial Investigating Outcomes of Exercise TrainingBioLINCC_HF_ACTIONphs003599.v1.p1HMBHeart Failure: A Controlled Trial Investigating Outcomes of Exercise TrainingBioLINCC_HF_ACTIONphs003599.v1.p1HMB-NPUGuiding Evidence Based Therapy Using Biomarker Intensified Treatment in Heart FailureBioLINCC_GUIDE_ITphs003621.v1.p1GRUSudden Cardiac Death in Heart Failure TrialBioLINCC_SCD_HeFTphs003654.v1.p1GRUTreatment of Preserved Cardiac Function Heart Failure With an Aldosterone AntagonistBioLINCC_TOPCATphs003665.v1.p1HMB-MDSHeart Failure Network - Inorganic Nitrite Delivery to Improve Exercise Capacity in HFpEFBioLINCC_HFN_INDIEphs003667.v1.p1GRUStudies of Left Ventricular DysfunctionBioLINCC_SOLVDphs003668.v1.p1GRUEvaluation Study of Congestive Heart Failure and Pulmonary Artery Catheterization EffectivenessBioLINCC_ESCAPEphs003782.v1.p1GRU
Study
phs003989
Multiethnic Cohort Adiposity Phenotype Study (MEC-APS)
A total of 1,861 healthy men and postmenopausal women aged 60-77 years with body mass indexes (BMIs) of 17.1-46.2 kg/m2 were enrolled into the study. Stratified sampling was conducted based on sex, race/ethnicity, and six BMI categories (18.5-21.9, 22-24.9, 25-26.9, 27-29.9, 30-34.9, and 35-40 kg/m2) to balance the sample size across the wide range of BMI and optimize adjustment for total adiposity in populations with different BMI distributions. Study participants underwent body composition assessment, in person interview, blood collection, and completed a self-administered questionnaire. A genome-wide association study of obesity phenotypes was conducted among five racial/ethnic groups of the Multiethnic Cohort Study.
Study
phs001689
DNA Methylation in Prostate Tumor and Paired Benign Tissue for African and European Ancestry Men
We recruited prostate cancer (PCa) patients undergoing robotic-assisted laparoscopic radical prostatectomy at the University of Chicago Medical Center Urology Clinic between 2011 and 2017. Our study includes 76 African American (AA) and 75 European American (EA) men with PCa of Gleason Score (GS) ≥7, recruited by the Epidemiology Research Recruitment Core at the University of Chicago. All patients consented to the collection of questionnaire data, prostate tissue, and medical records. The study was approved by the Institutional Review Board of the University of Chicago. Following prostatectomy, prostate tissue was sent to the Human Tissue Research Center (HTRC) at the University of Chicago. Each sample underwent histological examination and Gleason scoring by University of Chicago genitourinary pathologists. The presence of adenocarcinoma was confirmed by overexpression of alpha-methylacyl-coenzyme-A racemase (AMACR) and areas for DNA extraction were marked. Tissue samples used for DNA extraction were obtained using with either a 1mm biopunch or laser microdissection of a 100 µm2 area of tissue. Genome-wide DNA methylation profiles were generated using Illumina's Infinium MethylationEPIC BeadChip array kit. Genome-wide SNP data was generated (from benign tissue DNA) using the Illumina Infinium Multi-Ethnic Global-8 v1.0 array at the University of Chicago Genomics Core Facility. We performed imputation using the Haplotype Reference Consortium (HRC, Version r1.1 2016) panel, which includes all samples from the 1,000 Genomes Project (Phase 3), using the Michigan Imputation Server. The imputed SNP data for mQTL analysis was imputed based on genotype SNP data which was all from benign tissue DNA and the methylation data was identified in both tumor and benign tissue, both AA and EA men, and identified QTLs co-occuring with prostate cancer susceptibility loci.Data including SNP genotypes, DNA methylation, tissue type, and participant characteristics will be available in dbGaP.
Study
phs003516
Health and Retirement Study (HRS)
Introduction to V2: This data release comprises data from the V1 release combined with approximately 3,000 additional samples, collected during the HRS 2010 field period. The 2010 data include samples from a random half of the new cohort enrolled in 2010 along with a significant expansion of the minority sample. Description: The University of Michigan Health and Retirement Study (HRS) is a longitudinal panel study that surveys a representative sample of approximately 20,000 people in America over the age of 50 every two years. Supported by the National Institute on Aging (NIA U01AG009740) and the Social Security Administration, the HRS explores the changes in labor force participation and the health transitions that individuals undergo toward the end of their work lives and in the years that follow. The study collects information about income, work, assets, pension plans, health insurance, disability, physical health and functioning, cognitive functioning, and health care expenditures. Through its unique and in-depth interviews, the HRS provides an invaluable and growing body of multidisciplinary data that researchers can use to address important questions about the challenges and opportunities of aging. Because of its innovation and importance, the HRS has become the model and hub for a growing network of harmonized longitudinal aging studies around the world. Origins of the HRS. As the population ages it is increasingly important to obtain reliable data about aging and topics that are relevant to a range of policy issues in aging. To address this need, the National Institutes on Aging (NIA) established a cooperative agreement with the University of Michigan Institute for Social Research to collect such data. The HRS launched data collection in 1992 and has re-interviewed the original sample of respondents every two years since then. By adding new cohorts and refreshing the sample, the HRS has grown to become the largest, most representative longitudinal panel study of Americans 50 years and older. HRS Study Design. The target population for the original HRS cohort includes all adults in the contiguous United States born during the years 1931-1941 who reside in households, with a 2:1 oversample of African-American and Hispanic populations. The original sample is refreshed with new birth cohorts (51-56 years of age) every six years. The sample has been expanded over the years to include a broader range of birth cohorts as well. The target population for the AHEAD survey consists of United States household residents who were born in 1923 or earlier. Children of the Depression (CODA) recruits households born 1924-1930, War Babies 1942-47, Early Boomers 1948-53, and Mid-Boomers 1954-59. Data collection includes a mixed mode design combining in-person, telephone, mail, and Internet. For consenting respondents, HRS data are linked at the individual level to administrative records from Social Security and Medicare claims. Genetic Research in the HRS. The HRS has genotyped 2.5 million single nucleotide polymorphisms (SNPs) on respondents using Illumina's Human Omni2.5-Quad (Omni2.5) BeadChip. The genotyping was performed by the NIH Center for Inherited Disease Research (CIDR). Saliva was collected on half of the HRS sample each wave starting in 2006. In 2006, saliva was collected using a mouthwash collection method. From 2008 onward, the data collection method switched to the Oragene kit. Saliva completion rates were 83% in 2006, 84% in 2008, and 80% in 2010 among new cohort enrollees. HRS Phenotypic data. Phenotypic data are available on a variety of dimensions. Health measures include physical/psychological self-report, various health conditions, disabilities, cognitive performance, health behaviors (smoking, drinking, exercise), physical performance and anthropomorphic measures, and biomarkers (HbA1c, Total Cholesterol, HDL, CRP, Cystatin-C). Data are also available on health services including utilization, insurance and out-of-pocket spending with linkage to Medicare records. Economic measures include employment status/history, earnings, disability, retirement, type of work, income by source, wealth by asset type, capital gains/debt, consumption, linkage to pensions, Social Security earnings/benefit histories. There is also extensive information on family structure, proximity, transfers to/from of money, time, social and psychological characteristics, as well as a wide range of demographics. Performance on a cognitive test combining immediate and delayed word recall was selected as an example trait for the dbGaP data release. In the immediate word recall task the interviewer reads a list of 10 nouns to the respondent and asks the respondent to recall as many words as possible from the list in any order. After approximately five minutes of asking other survey questions, the respondent is asked to recall the nouns previously presented as part of the immediate recall task. The total recall score is the sum of the correct answers to these two tasks, with a range of 0 to 20. Researchers who wish to link to other HRS measures not in dbGaP will be able to apply for access from HRS. A separate Data Use Agreement (DUA) will be required for linkage to the HRS data. See the HRS website (http://hrsonline.isr.umich.edu/gwas) for details.
Study
phs000428
WTCCC2 Visceral Leishmaniasis (VL) samples
A WTCCC2 project genome-wide association study for visceral leishmaniasis (VL) in individuals from India, Brazil and Sudan, genotyped on the custom Illumina 670k array. The WTCCC2 analysis of the Brazilian and Indian samples is described in Fakiola et al. [Nat Genet. 2013 Feb;45(2):208-13].It should be noted that due to expected family structure in the data, normal analyses of these data should include an estimation of the relatedness between the samples. For more details about sample collection for the project please refer to the Methods section of the paper above. The samples from India were all collected from Bihar state in northeastern India. The Brazilian samples were collected from a wide area of northern Brazil, and managed at two laboratories in Natal and Belem. For more details on the geographic distribution of the Brazil samples please see Jamieson et al. [Genes Immun. 2007 Jan;8(1):84-90]. We have not generated qc_passed or info files for the Sudanese data, a preliminary scan of the data indicated that there may be DNA quality issues with these samples.
Study
EGAS00001000773
Cincinnati Children's Hospital Medical Center (CCHMC) - eMERGE Phase IIIA Data
This submission includes genotyping or sequencing data from separate cohorts, each is described in separate paragraphs below. Extreme early onset obesity Obesity is a serious epidemic condition and on the rise in the United States. Today, nearly one out of three children is overweight or obese in this country. According to the Center for Disease Control, 35.7% of American adults and 17% of American children are obese. The medical costs associated with obesity are estimated to be in the billions. Without a doubt, interplay of additive genetic effects and common environmental effects influence this complex disease. However, despite being exposed to so-called "obesogenic environment", a large proportion of the population remains of normal weight. These observations suggest that innate, non-environmental, factors make some individuals more susceptible to obesity providing support for biological mechanisms, and thus genetic factors, to underlie the individual's response to the obesogenic environment. In young children with severe obesity the relative role of genetics and in utero programming are likely to outweigh the short duration of environmental and lifestyle exposures. This group is therefore an ideal one to study as they are likely enriched for variants that influence the risk of developing obesity. The purpose of this project is to further study and understand obesity in childhood and to develop a repository of samples for future studies into obesity. Eosinophilic Esophagitis (EoE) Eosinophilic Esophagitis (EoE) is one of the manifestations of eosinophilic gastrointestinal inflammation which have profound effects on a patient's health and development. Results of epidemiologic studies performed through our center demonstrate that eosinophil-associated gastrointestinal disease is not an uncommon entity. While the epidemiology of eosinophilic esophagitis has not been thoroughly studied until recently, there appears to be a significant increase in the diagnosis of EoE in the last decade. Based on our research, this mainly reflects increased disease recognition, but there is also a bona-fide increase in disease incidence which coincides with the increasing incidence of asthma and allergic diseases in the industrialized world. In addition, many patients with intractable symptoms thought in the past to represent atypical GERD or other disorders are now being recognized as having EoE. Diagnosis of EoE requires endoscopy and biopsies to document the characteristic histologic findings of esophageal eosinophilia. In general, this study proposed to elucidate the mechanisms underlying eosinophil growth, survival, migration, and function, and to investigate and further characterize the pathophysiology of, clinical manifestations of, and spectrum of disease severity of eosinophilic esophagitis in humans. The de-identified genotyping and genome wide association data generated as part of this research will be used for further genome research. Familial Sample Repository (FSR) and Directed Sample Repository (DSR) De novo mutations could cause many diseases, which has been demonstrated in mental retardation, autism and many rare genetic disorders. Family-based studies have a variety of advantages over case/control studies, including the elimination of analysis artifacts related to population stratification, the detection of genes that act through a recessive mechanism of inheritance and validation that the trait is not transmitted from a parent, something not possible using a case/control design. Additionally, DNA from families can be used to identify de novo mutations suggesting strong candidate causal polymorphisms. For this project, samples will be collected from families on an on-going basis. Families may be recruited because the patient either has a disease which is thought to be of genetic origin or from the general patient population to serve as controls or future identified diseases. Some phenotypes under study include fibroblastic rheumatism, diaphragmatic hernia, polymicrogyria, severe congenital neutropenia, primary sclerosing cholangitis and staph infection. CLRR-Cincinnati Lupus Registry and Repository Systemic lupus erythematosus (SLE) is a complex, partially understood autoimmune disorder. Genetic origins for SLE are supported by high heritability (> 66%), familial aggregation, increased monozygotic twin concordance, genetic linkages, and candidate gene genetic association, including HLA genes, Fc receptors, and complement components. Relevant environmental factors likely include infections (Epstein-Barr virus), therapeutics, personal habits (smoking), and diet. To continue a research resource facility for collection of well-characterized pedigrees containing a proband with systemic lupus erythematosus we develop this repository. Juvenile Idiopathic Arthritis (JIA) Juvenile Idiopathic Arthritis (JIA) is a debilitating complex genetic disorder characterized by inflammation of the joints and other tissues and shares histopathological features with other autoimmune diseases. It is considered complex genetic traits. There are more than 50,000 children with JIA in the USA, approximately 1 per 1000 births, which is about the same incidence as juvenile diabetes. It is believed that genes in the major histocompatibility complex (MHC) play a role in defining genetic risk, and it can be hypothesized that loci in other chromosomal regions are involved in conferring risk in JIA. These candidate chromosomal regions can be identified using genome-wide association analyses. The long-term goal is a comprehensive understanding of the genetic basis of these disabling arthropathies for which the molecular basis is not presently understood. These data will contribute to a national resource for the study of autoimmunity in children. Better Outcomes for Children-Cytogenetics Since 2007, more than 4000 samples, enriched with various rare or common genetic diseases as well as specific chromosomal abnormalities such as deletions and duplications have been genotyped for the purpose of subsequent GWAS and Phewas analyses and uncovering main genetic effects.
Study
phs001011
mapped Bam files from whole transcriptome RNA-seq
In this study we characterized genomic alterations in two to five metachronous bladder tumors from 29 patients initially diagnosed with early stage disease. Fourteen patients (32 tumors) had non progressive disease (NPD) and 15 patients (34 tumors) had progressive disease (PD). Whole exome sequencing (WES, ~50x mean read depth and whole transcriptome RNA-seq was performed (RNA was not advalible for 4 tumors).
Data provided here consist of 71 mapped Bam files form whole transcriptome RNA-seq.
Dataset
EGAD00001002718
unmapped Bam files from whole transcriptome RNA-seq
In this study we characterized genomic alterations in two to five metachronous bladder tumors from 29 patients initially diagnosed with early stage disease. Fourteen patients (32 tumors) had non progressive disease (NPD) and 15 patients (34 tumors) had progressive disease (PD). Whole exome sequencing (WES, ~50x mean read depth and whole transcriptome RNA-seq was performed (RNA was not advalible for 4 tumors).
Data provided here consist of 71 unmapped Bam files form whole transcriptome RNA-seq.
Dataset
EGAD00001002717
H3Africa - Genomic and Environmental Risk Factors for Cardiometabolic Disease in Africans
The long-term vision of the AWI-Gen Collaborative Centre (CC) is to build sustainable capacity in Africa for research that leads to an understanding of the interplay between genetic, epigenetic and environmental risk factors for obesity and related cardiometabolic diseases (CMD) in sub-Saharan Africa. The CC will be consolidated under the auspices of the University of the Witwatersrand (Wits) and the International Network for the Demographic Evaluation of Populations and Their Health in Developing Countries (INDEPTH). It will capitalize on the unique strengths of existing longitudinal cohorts, including the urban Soweto and rural Agincourt studies in South Africa (Wits based), and the well established INDEPTH demographic health and surveillance centers in Kenya, Ghana, Burkina Faso and South Africa. The centers offer established infrastructure, trained fieldworkers, long-standing community engagement, and detailed longitudinal phenotypic data, focusing on obesity and cardiometabolic health. Key strengths are harmonized phenotyping across sites, building on strong existing cohorts, and representation of the geographic and social variability of African populations. We aim to: 1. Build sustainable infrastructure (biobanks and laboratories) and capabilities (well characterized population cohorts, genotyping and bioinformatics) for genomic research on the African continent; 2. Understand the genomic architecture of sub-Saharan populations from west, east and south Africa to guide genomic studies (genome sequencing and high throughput SNP and CNV arrays using unrelated individuals and family trios to improve the accuracy of haplotype analyses) and; 3. Investigate the independent and synergistic genomic contributions to body fat distribution (BMI, hip/waist circumference, subcutaneous and visceral fat) in these populations considering the relevant environmental and social contexts (rural/urban communities, quickly transitioning obesity prevalence, differential HIV, TB, and malaria infection histories). We will investigate the effect of obesity and fat distribution on the risk for CMD in the longitudinal cohorts.
Study
EGAS00001002482
Extreme phenotypes define epigenetic and metabolic signatures in cardiovascular diseases.
Improving the understanding of cardiometabolic syndrome pathophysiology and its
relationship with thrombosis are ongoing healthcare challenges. Using plasma biomarkers
analysis coupled with the transcriptional and epigenetic characterisation of cell types
involved in thrombosis, obtained from two extreme phenotype groups (obese and
lipodystrophy) and comparing these to lean individuals and blood donors, the present study
identifies the molecular mechanisms at play, highlighting patterns of abnormal activation in
innate immune phagocytic cells and shows that extreme phenotype groups could be
distinguished from lean individuals, and from each other, across all data layers. The
characterisation of the same obese group, six months after bariatric surgery shows the loss of
the patterns of abnormal activation of innate immune cells previously observed. However,
rather than reverting to the gene expression landscape of lean individuals, this occurs via the
establishment of novel gene expression landscapes. Netosis and its control mechanisms
emerge amongst the pathways that show an improvement after surgical intervention. Taken
together, by integrating across data layers, the observed molecular and metabolic differences
form a disease signature that is able to discriminate, amongst the blood donors, those
individuals with a higher likelihood of having cardiometabolic syndrome, even when not
presenting with the classic features.
Study
EGAS00001003780
SUM-seq data for Macrophage polarisation to M1 and M2 phenotypes experiment
We stimulated iPSC-derived M0 macrophages with LPS and IFN-γ to induce M1 polarization or IL-4 to induce M2 polarization. To discern early and sustained responses at chromatin accessibility and gene expression levels, we collected samples at five time points along the two polarization trajectories; prior to stimulation (M0) and at 1-hour, 6-hour, 10-hour, and 24-hour intervals, each sampled in duplicates totaling 18 samples, and subjected them to SUM-seq library preparation. Sequenced files for both data modalities are provided as demultiplexed fastq files.
Dataset
EGAD50000001206
The Genetics of Food Cue Reactivity in Children
This study aimed to investigate genetic factors related to appetite, eating in the absence of hunger and brain reward response to food cues in a convenience sample of pre-adolescent children taken through two studies (A and B). The study involved genome-wide genotyping using the Illumina Global Screening Array. The study found an obesity polygenic risk score related to parent-reported appetitive traits in children, The study also found that FTO was related to eating in the absence of hunger and brain reward response to food cues. Available data include genotypes, measured height and weight, appetitive traits, consumption, brain reward response to food cues, and sociodemographic variables.
Study
phs003550
UK_ReplicationChip
This dataset includes data from UK Multiple Sclerosis (MS) cases that were recruited through the University of Cambridge and included in the IMSGC Replicationchip experiment. Data from UK controls and additional UK cases that were recruited through other UK centres is available by direct application to those respective centres, as described in the original paper.
Dataset
EGAD00010002118
GEDI: A Developmental Model of Gene-Environment Interplay in SUDs: Combined Genotype Dataset from the "Great Smoky Mountains Study" and the "Caring for Children in the Community Study".
1. Great Smoky Mountains Study (GSMS; Costello et al. 1996, 1997) Three cohorts of boys and girls, aged 9, 11, and 13 years at intake in 1993, were selected from a rural population of some 20,000 children using a household equal probability design. A two-phase procedure was used for White and African-American youth to increase power by oversampling children at risk for psychiatric and SUDs. Parents (usually mothers) of the first stage random population sample completed a questionnaire about their child's behavioral problems. Of 4,195 subjects selected, 95% (N=3,896) of parents completed the screen. All children scoring above a predetermined threshold (the top 25% of the total scores), plus a 10% random sample of the remaining 75%, were recruited for detailed interviews. Results can be back-weighted to population levels for analyses. Half of the sample consists of females, and 6% are African Americans, reflecting the population of the study area. The interviewed sample of white and African-American subjects was 1,070 (80% of those recruited). American Indian youth were oversampled (100%) because they are an understudied group known to be at high risk for stressful events, substance disorders, and mood disorders. Of 431 age-eligible children, 350 (81% boys, 49% girls) participated. Thus, the size of total GSMS sample is 1,070 + 350 = 1,420. Data collection is complete for ages 9-26, and age 30 interviews are in progress. By age 26 a total of 9,858 interviews had been completed; the average number of interviews per subject was seven, and by age 26, 97.3% completed two or more interviews. 2. The Caring for Children in the Community Study (CCC; Angold et al., 2002) This representative study of psychiatric illness and service use in African-American and White youth took place in four rural counties in the southeastern USA. The two-stage sampling design and methods are similar to those used in the GSMS. Of 4,500 youth randomly selected from the 17,117 9- to 17-year-olds in the public school's database, 3,613 (80.0%) were successfully contacted and agreed to complete the behavioral screen. Of the 1,302 selected to participate in the study, 920 (70.7%) interviews were completed. Because CCC was also the only study in GEDI to contain more than a very few African-American participants, these were omitted from the multi-site analyses. Reprinted with permission from Cambridge University Press from Costello et al., 2013: PMID: 23461817 References: Costello et al., 1996: PMID: 8956679 Costello et al., 1997: PMID: 9184514 Angold et al., 2002: PMID: 12365876
Study
phs000852
Emory University African American Vaginal, Oral, and Gut Microbiome in Pregnancy Cohort Study
The Emory University African American Microbiome in Pregnancy Cohort Study prospectively enrolls African American women between 18-40 years of age without chronic medical conditions into a longitudinal follow-up study that involves completing microbiome sampling and data collection at two time points in pregnancy (8-14 weeks gestation and 24-30 weeks gestation). The study is investigating the role of the oral, vaginal, and gut microbiome on preterm birth and other pregnancy outcomes as well as biobehavioral factors that shape the microbiome among this socio-demographically diverse group of women. Microbiome sampling involves the oral, vaginal, and gut sites. Biobehavioral factors that are being explored for their association with microbiome composition at the three body sites include measures of sociodemographic status as well as diet and nutrition, infectious and stress exposures.
Study
phs001865
Genome-wide chromatin accessibility profiling of primary human glomerular and kidney cortex tubular outgrowth cultures
We generated primary cultures from mechanically isolated kidney glomeruli (filtration unit of the nephron) which are composed of podocytes and mesangial cells. In parallel, we generated primary kidney cortex tubule cell cultures, which are composed primarily of proximal tubule cells. Early passage cultures of these two cell types were subjected to chromatin accessibility profiling (DNase-Seq) and gene expression profiling (RNA-Seq). We found thousands of dynamically regulated enhancers in both cell types that potentially regulate nearby and distal target genes that are differentially expressed. These data will be useful for understanding the epigenomic regulation of gene transcription in key kidney cell types.
Study
phs001720
Comprehensive characterization of cell-free tumor DNA in plasma and urine of patients with renal tumors
Cell-free tumour-derived DNA (ctDNA) allows non-invasive monitoring of various cancers but its utility in renal cell cancer (RCC) has not been well established. A combination of untargeted and targeted sequencing methods, applied to two independent cohorts of patients (n=90) with various renal tumour subtypes, were used to determine ctDNA content in plasma and urine. Our data revealed lower plasma ctDNA levels in RCC relative to other cancers, with untargeted detection of ~33% for both cohorts. A highly sensitive personalised approach, applied to plasma and urine from select patients (n=22), improved detection to ~50%, including in patients with early stage and even benign lesions. A machine-learning based model, applied to untargeted data, predicted this detection, potentially offering a means of triaging patient samples for personalised analysis. We observed that plasma, and for the first time, urine ctDNA may better represent tumour heterogeneity than a single tissue biopsy. Longitudinal sampling of >200 plasma samples revealed that ctDNA can track disease course. These data highlight low ctDNA levels in RCC but indicate potential clinical utility provided improvement in isolation and detection approaches.
Study
EGAS00001003530
Longitudinal genome-wide analysis of progressive chronic lymphocytic leukemia under uniform front-line therapy of pentostatin, cyclophosphamide, and rituximab
Samples from two-center prospective phase 2 clinical trail conducted at Ohio State University (Columbus, OH) and Mayo Clinic (Rochester, MN) were analyzed in 12 cases. All patients had progressive CLL as defined by National Cancer Institute (NCI) Working Group criteria. Patients provided written informed consent for correlative studies according to the Declaration of Helsinki on an institutional review board approved protocol for the collection and use of samples for research purposes from both participating institutions. Eligible patients received a regimen consisting of pentostatin (2 mg/m2), cyclophosphamide (600 mg/m2), and rituximab (375 mg/m2) provided intravenously on day 1 of a 21-day cycle for a maximum of 6 cycles. Responses were assessed by NCI Working Group criteria and included a bone marrow evaluation and two-color flow cytometry 2 months after completion of therapy. Peripheral blood samples from these patients collected longitudinally before (Pre-Baseline), at (Baseline) and after therapy (Relapse) were analyzed for genomic heterogeneity and clonal complexity by high throughput Exome sequencing.
Study
phs000794
Bam files from Whole exome sequencing (WES, ~50x mean read depth) of metachronous bladder tumors
In this study we characterized genomic alterations in two to five metachronous bladder tumors from 29 patients initially diagnosed with early stage disease. Fourteen patients (32 tumors) had non progressive disease (NPD) and 15 patients (34 tumors) had progressive disease (PD). Whole exome sequencing (WES, ~50x mean read depth and whole transcriptome RNA-seq was performed (RNA was not advalible for 4 tumors)
Data provided here consist of 122 Bam files for WES (83 Tumors and 39 blood)
Dataset
EGAD00001002716
Whole exome sequencing data of germline and two independent primary leukemias of five patients
The contribution of genetic predisposing factors to the development of pediatric acute lymphoblastic leukemia (ALL), the most frequently diagnosed cancer in childhood, has not been fully elucidated. Children presenting with multiple de novo leukemias are more likely to suffer from genetic predisposition. Here, we selected five of these patients and analyzed the mutational spectrum of normal and malignant tissues. In two patients, we identified germline mutations in TYK2, a member of the JAK tyrosine kinase family. These mutations were located in two adjacent codons of the pseudokinase domain (p.Pro760Leu and p.Gly761Val). In silico modeling revealed that both mutations affect the conformation of this auto-regulatory domain. Consistent with this notion, both germline mutations promote TYK2 autophosphorylation and activate downstream STAT family members, which could be blocked with the JAK kinase inhibitor I. These data indicate that germline activating TYK2 mutations predispose to the development of ALL.
Study
EGAS00001001889
The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Genetics of Genetic Epidemiology of Metabolic Syndrome in an Island Population
The major objective of this study was to conduct a systematic genetic study of metabolic traits involved in metabolic syndrome through collection and analysis of epidemiological, demographic, environmental, and relevant biological and clinical data from a relatively isolated island population of the eastern Adriatic coast of Croatia. The population was chosen for the following reasons: 1) in spite of practicing a largely traditional life style and dietary habits, high rates of obesity, arterial hypertension, dyslipidemia and related metabolic abnormalities were found in previous studies; 2) the population was established by a relatively small number of founders, predominantly of Slavic descent from the mainland during 15th to 18th century AD, a genetically homogeneous population living in a homogeneous environment; 3) sharing a common European ancestry, a relevant population for study in the context of the general US population; 4) Croatian collaborators have been conducting anthropological and genetic studies in these communities for over three decades. There were two major aims of the study: 1) to recruit ~1200 adult participants and collect blood samples together with demographic, anthropometric, environmental and clinical data from the island of Hvar; to perform biochemical tests to measure glucose, insulin, uric acid and lipid levels; 2) conduct a genome-wide association analysis of metabolic traits and phenotypes using genome-wide SNP arrays (Affymetrix Genome-Wide Human SNP Array 5.0).
Study
phs000737
snRNA-seq schizophrenia control Prefrontal cortex
Human Brodmann Area 8/9 tissue from 43 schizophrenia subjects and 42 control individuals was obtained from three sources, NBB (Netherlands Brain Bank), Craig A. Stockmeier (University of Mississippi Medical Center), Macedonian/New York State Psychiatric Institute Brain Collection (Andrew J. Dwork, Columbia University). All subjects included in the study were matched between the two groups for age, gender, and postmortem interval. Nuclei were isolated and enriched for 85% neuronal nuclei before single nucleus RNA sequencing was performed with 10x Genomics Chromium Single Cell protocol v3.
Dataset
EGAD50000002447
ALS Compute
Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disorder characterized by the progressive loss of brain and spinal cord motor neurons. Half of ALS patients display cognitive symptoms of frontotemporal dementia (FTD); reciprocally, about 40% of FTD patients show motor neuron deficits, and approximately 15% develop overt ALS. The clinical overlap between ALS and FTD means that the two conditions are thought to represent a disease spectrum (ALS/FTD). In recent years, the identification of several genetic causes of ALS/FTD has contributed significantly to our understanding of disease pathogenesis. Unfortunately, one-third of the underlying genetic causes of familial cases and ~90% of sporadic cases of ALS/FTD remain unexplained. As such, there is a dire need to identify additional genetic factors contributing to ALS/FTD. Such studies require huge cohorts of harmonized whole genome sequencing (WGS) data sets from cases and controls. Currently, there are several major ongoing sequencing efforts for ALS patients. Numerous centers lead to inefficiency, especially in terms of overall costs. At a minimum, this includes the cost of high-performance computing, the storage of large data files, and the duplication of effort between groups. This lack of data harmonization between groups precludes sharing of genetic information and weakens collaborative efforts. The cost and logistics are also a barrier to attracting talented investigators to the ALS/FTD field. To overcome this unmet need, we have founded the ALS Compute project. We are centralizing the storage of ALS/FTD WGS data from every significant sequencing effort in the United States and beyond within a single Cloud environment. This approach will facilitate data harmonization and improve accessibility to the data. To accomplish this, we have made the data and the computational infrastructure available via the Terra platform hosted by NHGRI's Genomic Data Science Analysis, Visualization, and Informatics Lab-Space (AnVIL). This will allow researchers worldwide to access this wealth of data, develop new theories of the disease, and yield breakthroughs in our understanding of ALS/FTD.
Study
phs003184
The Intestinal Bacterial Metagenome in Pediatric Non-Alcoholic Fatty Liver Disease (NAFLD)
Nonalcoholic fatty liver disease (NAFLD) is the most common cause of chronic liver disease in children in the United States. NAFLD is associated with obesity and metabolic syndrome; however, there is limited understanding of the development and progression of NAFLD. There is evidence of abnormalities of bacterial colonization, and intestinal bacterial product-induced inflammation associated with NAFLD and disease progression. The goal of this study was to characterize the intestinal microbiome in pediatric participants with NAFLD and in both obese and normal weight controls to determine the relationship between alterations in the intestinal microbiome, inflammation, and the development of NAFLD. We hypothesized that alterations in the intestinal microbiome are associated with increased systemic inflammation and the development and severity of NAFLD.
Study
phs001837
ARRA - NHLBI Lung Cohorts Sequencing Project: Genetic modifiers of
The major goal of this project is to apply second generation resequencing technology to identify disease causing variants influencing pediatric and adult lung diseases in a collection of two longitudinal population cohorts of cystic fibrosis patients that have been well characterized for a comprehensive set of clinical traits. In Phase I, exome sequencing was performed on 43 cystic fibrosis patients with early Pa infection and 48 cystic fibrosis patients with late Pa infection to identify variants influencing the time to onset of Pa infection. In Phase II, additional exomes were added to the study, to reach a total of 91 individuals with early Pa infection and 96 with late Pa infection. The majority of the 340 subjects of Phase II do not have a Pa infection phenotype, but instead have a pulmonary function phenotype (121 severe vs. 124 mild impairment) as determined by the survival corrected Kulich FEV percentile of Corey et al. A small minority have intermediate phenotypes and/or show severe decline in lung function during childhood.
Study
phs000254
A Comprehensive Catalogue of Somatic Mutations from a Human Cancer Genome
We have carried out complete sequencing of the genome of the human male maligant melanoma cell line COLO-829 using the Illumina Genome Analyzer II. We generated a sequencing library with a median insert size of ~200 bp following random fragmentation and gel fractionation of the genomic DNA. We sequenced 75 bases from both ends of these templates to cover the COLO-829 genome to an average depth of more than 40x. We have carried out purity-filtering (PF) to remove mixed reads, where two or more different template molecules are close enough on the surface of the flow-cell to form a mixed or overlapping cluster. No other filtering of the data has been carried out prior to submission. We have also submitted sequence data for a lymphoblastoid control cell line COLO-829BL from the same individual.
Study
EGAS00000000052
The Bangladesh Environmental Enteric Dysfunction (BEED) Study
Environmental enteric dysfunction (EED), a sub-acute inflammatory condition of the small intestinal mucosa of unclear etiology, has been associated with a variety of environmental exposures and host factors and is implicated in growth faltering. Linear growth faltering usually occurs within the first two years of life and in most of the cases is irreversible demanding early diagnosis for treatment to be successful. The objectives of 'The Bangladesh Environmental Enteric Dysfunction Study (BEED)' are to investigate role of EED in malnutrition, examine the biology of EED to identify common biological pathways for potential interventions, to validate a system for histological scoring for EED and test the effectiveness of nutritional interventions in improving the growth parameters in children with stunting and or EED. In Bangladesh, participants are recruited from two age groups; a child cohort and a malnourished an adult cohort. In addition, two control groups are recruited for comparison consisting of a) children that are undergoing endoscopy as a part of their clinical care at the University of Virginia Health system (UVAHS) and b) an adult well-nourished control group recruited from the Gastroenterology Outpatient Department of Dhaka Medical College and Hospital in Bangladesh. The description of the study design and procedures of the study can be obtained from Mahfuz M, Das S, Mazumder RN, et al., 2017, PMID: 28801442.
Study
phs001891
CRISPR screening in human trophoblast stem cells reveals both shared and distinct aspects of human and mouse placental development
The placenta serves as the interface between the mother and fetus, facilitating the exchange of gases and nutrients between their separate blood circulation systems. Trophoblasts in the placenta play a central role in this process. Our current understanding of mammalian trophoblast development relies largely on mouse models. However, given the diversification of mammalian placentas, findings from the mouse placenta cannot be readily extrapolated to other mammalian species, including humans. To fill this knowledge gap, we performed CRISPR knockout (KO) screening in human trophoblast stem cells (hTSCs). We targeted genes essential for mouse placental development and identified more than 100 genes as critical regulators in both human hTSCs and mouse placentas. Among them, we further characterized in detail two transcription factors, DLX3 and GCM1, and revealed their essential roles in hTSC differentiation. Moreover, a gene function-based comparison between human and mouse trophoblast subtypes suggests that their relationship may differ significantly from previous assumptions based on tissue localization or cellular function. Notably, our data reveal that hTSCs may not be analogous to mouse TSCs or the extraembryonic ectoderm (ExE) in which in vivo TSCs reside. Instead, hTSCs may be analogous to progenitor cells in the mouse ectoplacental cone and chorion. This finding is consistent with the absence of ExE-like structures during human placental development. Our data not only deepen our understanding of human trophoblast development but also facilitate cross-species comparison of mammalian placentas.
Study
JGAS000659
503 genotypes from Inner Asia used in 'Close inbreeding and low genetic diversity in Inner Asian human populations despite geographical exogamy' publication
Inner Asia is particularly interesting to understand human history and evolution, as two groups presenting contrasting cultural traits (notably their language, their social organisation and matrimonial system) cohabit. We sampled 503 individuals from these two groups, belonging to 17 populations from 11 distinct ethnic groups (AltaiKizhi, Kazakh, Khakas, Kyrgyz, Mongolian, Shore, Tajik, Telengit, Tubalar, Turkmen and Uzbek). The samples were then genotyped with 5 different DNA-arrays and, after quality-control, the 253,532 autosomal SNPs present in all the arrays were merged together in the present dataset.
Study
EGAS00001002951
The Chinese University of Hong Kong Hereditary Spastic Paraplegia Data
The dataset contains the whole genome sequencing data of a family with two unaffected parents and two probands that showed Hereditary spastic paraplegias symptoms. Sequencing reads were aligned to human genome (GRCh38) using BWA-MEM, followed by indel-realignment and PCR-duplicates marking. Alignment results are available for download in BAM format.
Dataset
EGAD00001002146
CIP: Obesity-Diabetes Familial Risk, Viva La Familia Study
The VIVA LA FAMILIA Study was designed to identify genetic variants influencing childhood obesity and its comorbidities in the Hispanic population. Family recruitment and phenotyping were conducted in 2000-2005 in Houston, TX. All enrolled children (n=1030) and parents gave written informed consent or assent. The protocol was approved by the Institutional Review Boards for Human Subject Research for Baylor College of Medicine and Affiliated Hospitals and for Texas Biomedical Research Institute. The VIVA LA FAMILIA study design and methodology have been described in detail (Butte NF, 2006). Each family was ascertained on an obese proband, defined as a BMI > 95th percentile, between the ages 4-19 y. The cross-sectional, longitudinal study design consisted of baseline measurements, with a one-year. GWAS was performed using the Illumina HumanOmni1 v1.0 BeadChips on 815 children from 263 Hispanic families and HumanOmni 2.5-8v1 on an additional 43 children. Exome sequencing is being performed on 822 children using NimbleGen capture, followed by Illumina DNA sequencing. Butte NF, Cai G, Cole SA, Comuzzie AG. Viva la Familia Study: genetic and environmental contributions to childhood obesity and its comorbidities in Hispanic population. Am J Clin Nutr 2006;84(3):646-54. PMID: 16960181
Study
phs000616
The epigenetic landscape controlled by p63 in epidermal development
Transcription factor p63 is a key regulator of epidermal keratinocyte proliferation and differentiation. Heterozygous mutations of TP63 encoding p63 cause a spectrum of developmental disorders. EEC syndrome is caused by point mutations in the p63 DNA-binding domain, and manifests ectodermal dysplasia with defects in the epidermis and epidermal related appendages, limb malformation and cleft lip/palate. Five hotspot mutations affecting amino acids, R204, R227, R279, R280 and R304, have been found in approximately 90% of the EEC population. Although the role of p63 in normal epidermal development and differentiation has been demonstrated, the molecular mechanism by which p63 mutations cause the epidermal phenotype in diseases is not yet understood. In the two related studies, we characterize p63 mutant keratinocytes (R204W, R279H and R304W) and p63 mutant iPSCs (R204W and R304W) and the molecular mechanisms underlying their differentiation defects.
Study
phs001737
Organ maturation in preparation for birth (Peds RFA) to develop a tissue resource and a single-cell atlas of organ development and maturation for dissemination among the scientific and clinical community: RNA (2025-10-14)
Knowledge about abnormal organ development is important to understand pathology and to develop novel treatment approaches for individuals with congenital and acquired disease. Most of our current understanding is based on examination of tissues from the embryo and early foetus, collected from women undergoing termination of pregnancy in the first trimester (third) of pregnancy. There is very little known about normal and abnormal organ development from a developmental perspective during the crucial last two-thirds of pregnancy when much remodelling of foetal tissues occurs. This study will generate a single-cell atlas of late-foetal lungs, blood, heart, bone and immune organs.
.
This dataset contains all the data available for this study on 2025-10-14.
Dataset
EGAD00001015737
The_genetics_of_thinness_compared_to_obesity
The variation in weight within a shared environment is largely attributable to genetic factors. Whilst many genes/loci confer susceptibility to obesity, little is known about the genetic architecture of thinness. In this study we performed a genome-wide association study of 1,622 persistently thin healthy individuals (STILTS), 1,985 severe childhood onset obesity cases (SCOOP) and 10,433 population based individuals (UKHLS) used as a common set of controls. All participants were genotyped on the Illumina Core Exome array, including 551,839 markers and imputed to the combined UK10K and 1000G (phase3) reference panel. We contrast the genetic architecture of thinness with that of severe early onset obesity and explore whether the genetic loci influencing thinness are the same as those influencing obesity pr whether there are important genetic differences between them.
This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing
Study
EGAS00001002624
The genomic VCF data of the Integrative proteogenomic characterization of early esophageal cancer project
The genomic VCF data of the Integrative proteogenomic characterization of early esophageal cancer project ,this dataset contains 90 VCF files.
Dataset
EGAD00001008672
Utility of Capillary Blood in Gene Expression Studies
The FAA Functional Genomics Team examined the feasibility of analyzing gene expression profiles using capillary blood collected by fingerstick, with the goal of developing practical fingerstick blood collections to expand blood collection capabilities and allow large-scale or field-based collections. We compared the RNA sequencing results obtained from two different capillary blood collection and RNA extraction methods with the results obtained from a standard venous blood collection and RNA extraction protocol. We also compared capillary blood processing methods, as well as fingerstick locations, within one capillary collection/extraction method. Within each method we sought to distinguish male from female participants to assess the capacity of each collection/extraction method for gene expression profiling. The study population included 40 participants aged 18-60 who were free from any acute illness or any illness that would preclude blood collection. We determined that capillary blood possesses large-scale significant differences from venous blood and that the two capillary methods produced similar gene expression profiles. We further determined that all methods were useful for detecting gene expression differences between assessed groups (male and female participants in this case). We conclude that capillary blood is suitable for gene expression profiling in quantities as low as 100 μL whole blood, but recommend that researchers should rely on a single blood collection method if they desire to compare results between individual studies. Raw sequencing files (.fastq) from this study are available in dbGaP.
Study
phs003496
National Eye Institute Glaucoma Human Genetics Collaboration (NEIGHBOR) Consortium Glaucoma Genome-Wide Association Study
This is a case-control study of primary open angle glaucoma (POAG). POAG is an intraocular pressure (IOP) related progressive optic neuropathy that ultimately leads to blindness. For this study we have formed a collaborative consortium contributing 2170 POAG cases and 2347 controls with a unified definition of POAG (the NEIGHBOR consortium: NEI Glaucoma Human genetic collaBORation). The case definition has also been harmonized with an additional 976 cases and 1140 controls from the NHGRI supported GENEVA (gene-environment) study of glaucoma (GLAUGEN) (NIH/NHGRI U01HG004728, Pasquale PI). Cases and controls were recruited from ophthalmology clinics and were examined by ophthalmologists. For cases, the clinical exam included intraocular pressure measurements, optic nerve assessment and visual field evaluation. Controls had no family history of glaucoma, normal intraocular pressure and normal optic nerves. Cases and controls were also drawn from two clinical trial populations: Advanced Glaucoma Intervention Study (AGIS, NEI U10EY006827, D. Gaasterland PI) and Collaborative Initial Glaucoma Treatment Study (CIGTS, NEI U10 EY009149, P. Lichter PI). The NEIGHBOR consortium has two Co-Principal Investigators: J. Wiggs (Harvard, MEEI), and M. Hauser (Duke). The consortium includes eleven different centers where data collection and analysis take place. The eleven sites and investigators are: Harvard Medical School (Massachusetts Eye and Ear Infirmary) (J. Wiggs, L. Pasquale); Duke University Medical Center (M. Hauser, E. Hauser, R. Allingham, S. Schmidt); University of Michigan (J. Richards, S. Moroi, P. Lichter); University of Miami (M. Pericak-Vance, R. Lee, D. Budenz); Vanderbilt University (J. Haines); University of California San Diego (K. Zhang, R. Weinreb; T. Gaasterland); University of Pittsburgh (J. Schuman, G. Wollstein); University of West Virginia (A. Realini, J. Charlton, S. Zareparsi); Johns Hopkins University (D. Friedman, D. Zack); Stanford University (D. Vollrath, K. Singh), Eye Doctors of Washington (D. Gaasterland). Hemin Chin serves as the NEI Staff Collaborator. This national collaborative study is supported by multiple NIH grants: NEI R01 EY015543 (Allingham); NEI U10 EY006827 (D. Gaasterland); NHLBI R01 HL073389 (E. Hauser); NEI R01 EY13315 (M. Hauser); NEI U10 EY009149 (Lichter); NEI R01 EY015473 (Pasquale); NEI U10 EY012118 (Pericak-Vance); NEI R03 EY015682 (Realini); NEI R01 EY011671 (Richards); NEI R01 EY09580 (Richards); NEI R01 EY013178 (Schuman); NEI R01 EY015872 (Wiggs); NEI R01 EY009847 (Wiggs); NEI R01 EY010886 (Wiggs); NEI R01 EY144428 (Zhang); NEI R01 EY144448 (Zhang); NEI R01 EY18660 (Zhang). Funding support for genotyping, which was performed at the Johns Hopkins University Center for Inherited Disease Research, was provided by the National Eye Institute (X01HG005259).
Study
phs000238
IMCISION DNAseq
Standard-of-care (SOC) surgery for locoregionally advanced head and neck squamous cell carcinoma (HNSCC) is intensive and results in 30‒50% five-year overall survival. Anti-PD1 immune checkpoint blockade (ICB) durably improves survival rates in recurrent/metastatic HNSCC. We report on the non-randomized phase Ib/IIa IMCISION trial (NCT03003637) of 32 HNSCC patients treated with two doses neoadjuvant nivolumab (NIVO MONO, n=6, phase Ib arm A) or two doses of nivolumab plus a single dose of ipilimumab (COMBO, n=26, 6 in phase Ib arm B and 20 in phase IIa) prior to surgery and, if indicated, adjuvant radiotherapy. Neoadjuvant ICB was feasible in all phase Ib patients, meeting the phase Ib primary feasibility endpoint. One phase IIa patient had progressive disease precluding surgery. Primary tumor pathological response (phase IIa primary endpoint), defined as the % change in viable tumor cells from baseline biopsy to on-treatment resection, was evaluable in 29 patients. . We observed a major pathological response (MPR, 90‒100% response) in 8/23 (35%) evaluable COMBO and 1/6 (17%) NIVO MONO patients. None of the patients with MPR after NIVO MONO or COMBO developed a tumor relapse after 24.0 months median follow-up. FDG-PET-based total lesion glycolysis identified MPR patients prior to surgery. A baseline AID/APOBEC-associated mutational profile and an on-treatment decrease in hypoxia signature expression were observed in MPR patients. Our data indicate that neoadjuvant NIVO MONO and COMBO are safe in HNSCC, and that particularly COMBO ICB shows encouraging effectivity. IMCISION provides rationales for future HNSCC trials aiming to pre-operatively select patients with a likely MPR after neoadjuvant ICB for de-escalation of SOC.
Study
EGAS00001005466
A Genome-Wide Association Study on Cataract and HDL in the Personalized Medicine Research Project Cohort
The primary goals of this project are to develop and validate electronic phenotyping algorithms, to accurately identify cases and controls while maintaining a positive predictive value (PPV) of >95%, and to conduct a genome wide association study that advances understanding of two specific yet interrelated disease states, while simultaneously engaging the community in these research efforts. Lipid abnormalities and cataracts are both diseases of public health significance, they share common risk factors, and they are both complex diseases which likely have many genes contributing to disease development. Whole genome association studies with these two outcomes and environmental risk factors could yield novel data about the etiology of the two separate outcomes as well as their interaction. PhenX is a project designed to prioritize Phenotypes and eXposure measures for Genome-wide Association Studies (GWAS). The PhenX Toolkit is a valuable resource for researchers who are planning or expanding a study and would like to incorporate well established measures that have been recommended by experts in the field. We are currently interested in gene-environment interactions for "age related cataract".
Study
phs000170
African American Multiple Myeloma GWAS
The case set is from a case-control study designed to identify common genetic risk factors for multiple myeloma in African Americans, the population with the highest risk for this cancer. We conducted two GWAS and combined each of these with convenience controls consisting of unaffected African American participants in cohorts with existing GWAS data. We then conducted a meta-analysis of the two sets. Cases were persons of African ancestry with smoldering or active multiple myeloma identified at participating oncology clinics or through SEER registries, as part of the African American Multiple Myeloma Study (AAMMS) diagnosed from Jan 1, 1988 through July 31, 2016. The majority of the samples were collected from incident and prevalent cases diagnosed since Jan 1, 2008 (80.9%), with a minority obtained from biobanks from cases diagnosed prior to 2008 (19.1%). Additional samples were obtained from the Multiethnic Cohort (USC and University of Hawaii) (n=40), the University of California at San Francisco Multiple Myeloma Study (n=27), and from the Multiple Myeloma Research Consortium for secondary analysis (samples originally provided to MMRC by 8 additional sites (n=84)). We have identified the phenotype (smoldering myeloma, plasma cell multiple myeloma, or myeloma not otherwise specified (myeloma NOS) when myeloma phenotype was not known), sex and age at diagnosis in this data set. The initial GWAS set consisted of 1308 (1,305 passed QC) cases with DNA samples, with a GWAS performed on the Illumina Human Core. Controls consisted of 7,078 unaffected African American subjects who were participants in the African American Prostate and Breast Consortium, with existing GWAS data from the Illumina1M Duo BeadChip. The second GWAS set consisted of 529 African American multiple myeloma patients with samples (406 from University of Arkansas, results not contained in this dataset because NCI funds were not used for the collection and genotyping) with GWAS data resulting from the Illumina Mega-BeadChip v1.1. Controls were 2,390 unaffected African American participants in the Multiethnic Cohort with existing GWAS data from the same array. After QC and removal of duplicates within sets, sex mismatches, and removal of plasmacytoma cases (ICD-0 code 9731), this deposited data set contains the typed GWAS data for the Illumina Human Core (set 1) (n=1298), and for the MegaBead Chip v1.1 (set 2) (n=123), with the University of Arkansas samples removed, for a total of 1,421 case genotypes. Note that 13 samples that overlap set 1 and set 2 were included.
Study
phs001632
McGill Epigenomics Mapping Centre
The McGill Epigenomics Mapping Centre (EMC) and Data Coordination Centre (EDCC) were established in 2012 at the McGill University and Genome Quebec Innovation Centre (Montreal, Canada) to support large-scale human epigenome mapping for a broad spectrum of cell types and diseases. Previous epigenetic studies have provided profiles of specific chromatin marks in relation to basic biological processes, and future studies of molecular mechanisms must build upon these proofs-of-concept by integrating sequence-based variation with multiple levels of epigenetic and transcriptional regulation across the genome of human tissues and animal disease models. This project leverages the high-throughput sequencing infrastructure and expertise in genomics and transcriptomics at the McGill Innovation Centre to carry out this research. Controlled access to the data by collaborators and the greater scientific community is gained via a portal, which takes advantage of Compute Canada high-performance computing cluster resources to manage the large volume of data associated from the generation of reference epigenome maps. McGill University is one of two Canadian mapping and data coordination centres, the other based at the Michael Smith Genome Sciences Centre in Vancouver, British Columbia. The generation of comprehensive epigenome maps at McGill University is part of a larger international effort that is coordinated by the International Human Epigenome Consortium (IHEC), whose overall long-term objective is to determine the extent to which the epigenome shapes human populations over generations and in response to the environment. This project is funded under the Canadian Epigenetics, Environment, and Health Research Consortium (CEEHRC) by the Canadian Institutes of Health Research and by Genome Quebec, with additional support from Genome Canada. The computing and networking infrastructure, and part of the software development, are provided by Compute Canada and CANARIE.
Study
EGAS00001000995
Intergenerational Impact of Genetic and Psychological Factors on Blood Pressure (InterGEN Study)
This research study focuses on the genomic and psychological environmental effects on blood pressure among mothers and their children. The study includes 250 mothers who self-identify as AA/black, and 250 children between the ages of 3 and 5 years old at the time of enrollment. The study is longitudinal and data collection interviews are scheduled every six months for two years after enrollment, for a total of four participant interviews. Assessment is as follows: genetic (time 1 only) and psychological factors (every six months for two years).
Study
phs001792
RNA-seq study of a Princess Margaret Cancer Centre human acute myeloid leukemia patient cohort
The PMCC AML RNAseq dataset consists of 81 AML patient samples (clinical data in Supplemental Table 11 of manuscript), processed in two batches. These patient samples are able to engraft in the NSG (NOD.Cg PrkdcscidIl2rgtm1Wjl /SzJ) mouse model. Five patients (90543, 598, 90240, 110484, 100500) were included in both batches. Viaably frozen material from the Leukemia Tissue Bank at Princess Margaret Cancer Centre/ University Health Network were thawed by dropwise addition of X-VIVO + 50% fetal calf serum supplemented with DNase (100μg/mL final concentration, Roche). RNA was extracted from bulk peripheral blood mononuclear cells (PBMC) using the RNeasy Micro Kit (Qiagen Inc.). A paired-end 76 base-pair flow-cell lane Illumina High seq 2000 yielded an average of 240 million sequence reads aligning to genome per sample at the Genome Sciences Centre, BC Cancer Agency for cohort 1. Cohort 2 was subjected to 125 bp, paired-end RNA-sequencing on the Illumina HiSeq 2500 with an average of 50 million reads/sample at the Centre for Applied Genomics, Sick Kids Hospital.
Study
EGAS00001004792
The Atherosclerosis Risk in Communities (ARIC) Study
The Atherosclerosis Risk in Communities (ARIC) Study, sponsored by the National Heart, Lung and Blood Institute (NHLBI), is a prospective epidemiologic study conducted in four U.S. communities. The four communities are Forsyth County, NC; Jackson, MS; the northwest suburbs of Minneapolis, MN; and Washington County, MD. ARIC is designed to investigate the etiology and natural history of atherosclerosis, the etiology of clinical atherosclerotic diseases, and variation in cardiovascular risk factors, medical care and disease by race, gender, location, and date. ARIC includes two parts: the Cohort Component and the Community Surveillance Component. The Cohort Component began in 1987, and each ARIC field center randomly selected and recruited a cohort sample of approximately 4,000 individuals aged 45-64 from a defined population in their community. A total of 15,792 participants received an extensive examination, including medical, social, and demographic data. These participants were reexamined every three years with the first screen (baseline) occurring in 1987-89, the second in 1990-92, the third in 1993-95, and the fourth and last exam was in 1996-98. Follow-up occurs yearly by telephone to maintain contact with participants and to assess health status of the cohort. In the Community Surveillance Component, currently ongoing, these four communities are investigated to determine the community-wide occurrence of hospitalized myocardial infarction and coronary heart disease deaths in men and women aged 35-84 years. Hospitalized stroke is investigated in cohort participants only. Starting in 2006, the study conducts community surveillance of inpatient (ages 55 years and older) and outpatient heart failure (ages 65 years and older) for heart failure events beginning in 2005. ARIC is currently funded through January 31, 2012. This study is part of the Gene Environment Association Studies initiative (GENEVA, http://www.genevastudy.org) funded by the trans-NIH Genes, Environment, and Health Initiative (GEI). The overarching goal is to identify novel genetic factors that contribute to atherosclerosis and cardiovascular disease through large-scale genome-wide association studies of well-characterized cohorts of adults in four defined populations. Genotyping was performed at the Broad Institute of MIT and Harvard, a GENEVA genotyping center. Data cleaning and harmonization were done at the GEI-funded GENEVA Coordinating Center at the University of Washington.
Study
phs000090
Kibbutzim Family study
The Kibbutzim Family Study (KFS) was established in 1992 to investigate the environmental and genetic determinants of cardiometabolic risk factors and their change over time. The participants belong to large families living in close-knit communities, called “Kibbutzim”, in Northern Israel. The Kibbutz has been a communal settlement, which has created a relatively homogeneous environment for its members. Kibbutz members are mostly of Ashkenazi Jewish ancestry, with the remaining members belonging to other Jewish subgroups. Participants were recruited in two phases from six Kibbutzim. In the first recruitment phase of the study (1992–1993) 500 individuals from 80 extended families (range 2 to 43) were examined. During the second phase (1999–2000), all participants from the first phase were invited for repeat examinations (80% response rate) and additional new participants were recruited, giving a total of 922 individuals from 150 extended families (range 2 to 55). Families were invited to participate if they consisted of at least four individuals who (i) lived in the Kibbutz, (ii) spanned at least two generations, and (iii) were at least 15 years old. Families were retained if at least two family members consented to participate in the study. Overall, 1033 participants were recruited; 111 were examined only in the first phase, 533 only in the second phase, and 389 were included in both. 901 individuals were successfully genotyped using Illumina HumanCoreExome BeadChip.
Study
EGAS00001002782
Luminal Androgen Receptor-Enriched Triple Negative Breast Cancer
This is a study to determine the efficacy of androgen receptor (AR) inhibitors in LAR (luminal androgen receptor)-enriched triple-negative breast cancer (TNBC) in the neoadjuvant setting. Twenty-four patients were treated with neoadjuvant AR inhibitor enzalutamide and paclitaxel for 12 weeks. Whole exome sequencing and RNA-sequencing was performed prior to treatment. The data for only two patients are consented for release through dbGaP. The remaining data are available under a Materials Transfer Agreement with the University of Texas MD Anderson Cancer Center.
Study
phs003586
NHLBI TOPMed: MESA and MESA Family AA-CAC
The Multi-Ethnic Study of Atherosclerosis (MESA) is a study of the characteristics of subclinical cardiovascular disease (disease detected non-invasively before it has produced clinical signs and symptoms) and the risk factors that predict progression to clinically overt cardiovascular disease or progression of the subclinical disease. MESA researchers study a diverse, population-based sample of 6,814 asymptomatic men and women aged 45-84. Thirty-eight percent of the recruited participants are white, 28 percent African-American, 22 percent Hispanic, and 12 percent Asian, predominantly of Chinese descent. Comprehensive phenotypic and pedigree data for MESA study participants are available through dbGaP entry phs000209. MESA Participants were recruited from six field centers across the United States: Wake Forest University, Columbia University, Johns Hopkins University, University of Minnesota, Northwestern University and University of California - Los Angeles. Each participant received an extensive physical exam and determination of coronary calcification, ventricular mass and function, flow-mediated endothelial vasodilation, carotid intimal-medial wall thickness and presence of echogenic lucencies in the carotid artery, lower extremity vascular insufficiency, arterial wave forms, electrocardiographic (ECG) measures, standard coronary risk factors, sociodemographic factors, lifestyle factors, and psychosocial factors. Selected repetition of subclinical disease measures and risk factors at follow-up visits allows study of the progression of disease. Blood samples have been assayed for putative biochemical risk factors and stored for case-control studies. DNA has been extracted and lymphocytes cryopreserved (for possible immortalization) for study of candidate genes and possibly, genome-wide scanning, expression, and other genetic techniques. Participants are being followed for identification and characterization of cardiovascular disease events, including acute myocardial infarction and other forms of coronary heart disease (CHD), stroke, and congestive heart failure; for cardiovascular disease interventions; and for mortality. In addition to the six Field Centers, MESA involves a Coordinating Center, a Central Laboratory, and Central Reading Centers for Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Ultrasound, and Electrocardiography (ECG). Protocol development, staff training, and pilot testing were performed in the first 18 months of the study. The first examination took place over two years, from July 2000 - July 2002. It was followed by five examination periods that were 17-20 months in length. Participants have been contacted every 9 to 12 months throughout the study to assess clinical morbidity and mortality. MESA Family The general goal of the MESA Family Study, an ancillary study to MESA funded by a grant from NHLBI, is to apply modern genetic analysis and genotyping methodologies to delineate the genetic determinants of early atherosclerosis. This is being accomplished by utilizing all the current organizational structures of the Multi-Ethnic Study of Atherosclerosis (MESA) and Genetic Centers at Cedars-Sinai Medical Center and University of Virginia. In the MESA Family Study, the goal is to locate and identify genes contributing to the genetic risk for cardiovascular disease (CVD), by looking at the early changes of atherosclerosis within families (mainly siblings). 2128 individuals from 594 families, yielding 3,026 sibpairs divided between African Americans and Hispanic-Americans, were recruited by utilizing the existing framework of MESA. MESA Family studied siblings of index subjects from the MESA study and from new sibpair families (with the same demographic characteristics) and is determining the extent of genetic contribution to the variation in coronary calcium (obtained via CT Scan) and carotid artery wall thickness (B-mode ultrasound) in the two largest non-majority U.S. populations. In a small proportion of subjects, parents of MESA index subjects participating in MESA Family were studied but only to have blood drawn for genotyping. The MESA Family cohort was recruited from the six MESA Field Centers. MESA Family participants underwent the same examination as MESA participants during May 2004 - May 2007. DNA was extracted and lymphocytes immortalized for study of candidate genes, genome-wide linkage scanning, and analyzed for linkage with these subclinical cardiovascular traits. While linkage analysis is the primary approach being used, an additional aspect of the MESA Family Study takes advantage of the existing MESA study population for testing a variety of candidate genes for association with the same subclinical traits. Genotyping and data analysis will occur throughout the study.
Study
phs001416
Women's Health Initiative Clinical Trial and Observational Study
The Women's Health Initiative (WHI) is a long-term national health study that has focused on strategies for preventing heart disease, breast and colorectal cancer, and osteoporotic fractures in postmenopausal women. The original WHI study included 161,808 postmenopausal women enrolled between 1993 and 1998. The Fred Hutchinson Cancer Research Center in Seattle, WA serves as the WHI Clinical Coordinating Center for data collection, management, and analysis of the WHI. The WHI has two major parts: a partial factorial randomized Clinical Trial (CT) and an Observational Study (OS); both were conducted at 40 Clinical Centers nationwide. The CT enrolled 68,132 postmenopausal women between the ages of 50-79 into trials testing three prevention strategies. If eligible, women could choose to enroll in one, two, or all three of the trial components. The components are: Hormone Therapy Trials (HT): This double-blind component examined the effects of combined hormones or estrogen alone on the prevention of coronary heart disease and osteoporotic fractures, and associated risk for breast cancer. Women participating in this component with an intact uterus were randomized to estrogen plus progestin (conjugated equine estrogens [CEE], 0.625 mg/d plus medroxyprogesterone acetate [MPA] 2.5 mg/d] or a matching placebo. Women with prior hysterectomy were randomized to CEE or placebo. Both trials were stopped early, in July 2002 and March 2004, respectively, based on adverse effects. All HT participants continued to be followed without intervention until close-out. Dietary Modification Trial (DM): The Dietary Modification component evaluated the effect of a low-fat and high fruit, vegetable and grain diet on the prevention of breast and colorectal cancers and coronary heart disease. Study participants were randomized to either their usual eating pattern or a low-fat dietary pattern. Calcium/Vitamin D Trial (CaD): This double-blind component began 1 to 2 years after a woman joined one or both of the other clinical trial components. It evaluated the effect of calcium and vitamin D supplementation on the prevention of osteoporotic fractures and colorectal cancer. Women in this component were randomized to calcium (1000 mg/d) and vitamin D (400 IU/d) supplements or a matching placebo. The Observational Study (OS) examines the relationship between lifestyle, environmental, medical and molecular risk factors and specific measures of health or disease outcomes. This component involves tracking the medical history and health habits of 93,676 women not participating in the CT. Recruitment for the observational study was completed in 1998 and participants were followed annually for 8 to 12 years. Extension Studies: The original protocol allowed for follow-up until March 2005, after which participants were invited to enroll in the first WHI Extension Study for follow-up through 2010. Participants were invited again to participate in the second WHI Extension Study with continued follow up from 2010 to at least 2015. As of March 31, 2011 there were 93,122 women enrolled in the second extension. In Extension Study 2, the overall WHI study population was divided into two new subsamples, the Medical Records Cohort (MRC) and the Self-Report Cohort (SRC). The MRC consists of all former hormone trial participants and all African American and Hispanic participants from all study components. The SRC consists of the remaining participants. The extent of outcome information collected differs between the two cohorts, with more extensive outcomes information collection on the MRC. As part of Extension Study 2, selected older WHI participants were invited to participate in an In Person Visit (a.k.a., Long Life Study) at their homes during which additional blood samples were collected and various measurements were taken (such as blood pressure, height, weight, waist circumference, grip strength, etc.). In October 2015, Extension Study 2 was renewed with continued follow-up planned through October 2020, pending annual contract review and renewal. Additional Information: The WHI website, https://www.whi.org/about/SitePages/About%20WHI.aspx has much more information about the study. For WHI data collection forms used over the years, please see https://www.whi.org/researchers/studydoc/SitePages/Forms.aspx. For additional dataset documentation, see https://www.whi.org/researchers/data/Pages/Available%20Data.aspx. For data preparation and use, please refer to 'WHI dbGaP Cohort Data Release Data Preparation Guide May 2018' for additional details about the WHI data. The WHI Cohort is utilized in the following dbGaP substudies. To view genotypes, analysis, expression data, other molecular data, and derived variables collected in these substudies, please click on the following substudies below or in the "Substudies" box located on the right hand side of this top-level study page phs000200 WHI Cohort. phs000386 WHI SHARe phs000281 GO-ESP WHISP phs000315 WHI GARNET phs000503 WHISE phs000227 PAGE WHI phs000675 WHIMS+ phs000746 WHI Harmonized and Imputed GWAS phs001334 WHI Metabolomics of CHD phs001335 WHI BA23 phs001614 WHI LLS Phase III GWAS
Study
phs000200
UK10K NEURO FSZNK
In the UK10K project we propose a series of complementary genetic approaches to find new low frequency/rare variants contributing to disease phenotypes. These will be based on obtaining the genome wide sequence of 4000 samples from the TwinsUK and ALSPAC cohorts (at 6x sequence coverage), and the exome sequence (protein coding regions and related conserved sequence) of 6000 samples selected for extreme phenotypes. Our studies will focus primarily on cardiovascular-related quantitative traits, obesity and related metabolic traits, neurodevelopmental disorders and a limited number of extreme clinical phenotypes that will provide proof-of-concept for future familial trait sequencing. We will analyse directly quantitative traits in the cohorts and the selected traits in the extreme samples, and also use imputation down to 0.1% allele frequency to extend the analyses to further sample sets with genome wide genotype data. In each case we will investigate indels and larger structural variants as well as SNPs, and use statistical methods that combine rare variants in a locus or pathway as well as single-variant approaches. This Finnish schizophrenia sample set has been collected from a population cohort using national registers. The entire sample collection consists of 2756 individuals from 458 families of whom 931 are diagnosed with schizophrenia spectrum disorder. Families outside Kuusamo (n=288) all had at least two affected siblings. All diagnoses are based on DSM-IV and for a large fraction of cases there is cognitive data.For further details/descriptions with regard to this data set please contact Tiina Paunio (tiina.paunio@thl.fi)
Study
EGAS00001000119
Proteogenomics reveals two distinct biological pilocytic astrocytoma subgroups
Pilocytic astrocytoma (PA) is the most common pediatric brain tumor and driven by aberrant MAPK signaling, typically mediated by BRAF alterations. While five-year overall survival rates exceed 95%, tumor recurrence constitutes a major clinical challenge in incompletely resected tumors despite chemotherapeutic or radiation based therapies. Therefore, we used proteogenomics to discern the biological heterogeneity of PA to improve classification of this tumor entity and identify novel therapeutic targets.
Our proteogenomics approach integrates RNA sequencing and LC/MS-based proteomic profiling data from a cohort of 58 confirmed, primary PA samples. An integrative genomics approach was conducted to discern the biological heterogeneity of PA and to identify aberrant pathway activation in these biological subgroups.
In summary, Pilocytic astrocytomas segregate into two groups where younger patients are significantly associated with Group 1. Importantly, we validate the two distinct biological subgroups in two non-overlapping cohorts. The biological heterogeneity seen here may improve biological classification and reveal novel therapeutic targets specifically useful for non-resectable tumors with high risk of recurrent or progressive disease.
Study
EGAS00001006402
PROGRESS/ELEMENT DNA Methylation Study
An extension to the Early Life Exposures in Mexico to Environmental Toxicants (ELEMENT) birth cohort of Mexico City, the Programming Research in Obesity, GRowth, Environment and Social Stress (PROGRESS) Cohort is an ongoing longitudinal pre-birth cohort, established in 2006 in Mexico City, partnering Icahn School of Medicine at Mount Sinai with Harvard University and the National Institute of Public Health in Mexico, which was designed to study the effects of prenatal exposure to toxic metals, air pollution, phthalates, and stress on childhood development. Pregnant women of 18 years of age and older, pregnant for less than 20 weeks of gestation, had no documentation of heart or kidney disease, no use of steroids or anti-epilepsy drugs, no daily alcohol consumption, had telephone access, and planned to live in Mexico city for the following 3 years, and receiving care through the Mexican Social Security System were initially enrolled (n=1,054). In addition to clinical, demographic and exposure data collected, cord blood was collected to interrogate DNA methylation across the genome for over 300 mother-child dyads. Clinical assessments and exposures were captured during several life stages, including prenatal, infant (0-1 year), youth (1-18 years), and adulthood (mother). The PROGRESS cohort added well-documented phenotyping of children for obesity, metabolic dysfunction, respiratory outcomes, and cardiovascular outcomes, as well as measures of air pollutant, personal care/consumer product, non-chemical stress, and metal mixture exposures. No clinical trials were conducted in this cohort. The data collected in this study should provide a unique resource to investigate DNA methylation as it relates to several environmental exposures and adverse cardiometabolic and neurocognitive health in mothers and children from a prospective birthing cohort. For access to demographic, clinical, and exposure data please directly contact study principal investigators.
Study
phs002754
Extreme phenotypes define epigenetic and metabolic signatures in cardiovascular diseases
Improving the understanding of cardiometabolic syndrome pathophysiology and its relationship with thrombosis are ongoing healthcare challenges. Using plasma biomarkers analysis coupled with the transcriptional and epigenetic characterisation of cell types involved in thrombosis, obtained from two extreme phenotype groups (obese and lipodystrophy) and comparing these to lean individuals and blood donors, the present study identifies the molecular mechanisms at play, highlighting patterns of abnormal activation in innate immune phagocytic cells and shows that extreme phenotype groups could be distinguished from lean individuals, and from each other, across all data layers. The characterisation of the same obese group, six months after bariatric surgery shows the loss of the patterns of abnormal activation of innate immune cells previously observed. However, rather than reverting to the gene expression landscape of lean individuals, this occurs via the establishment of novel gene expression landscapes. Netosis and its control mechanisms emerge amongst the pathways that show an improvement after surgical intervention. Taken together, by integrating across data layers, the observed molecular and metabolic differences form a disease signature that is able to discriminate, amongst the blood donors, those individuals with a higher likelihood of having cardiometabolic syndrome, even when not presenting with the classic features.
Dataset
EGAD00001005197
Transcriptomic profiling of granulosa cells from IVF patients at different ages
Two major factors contributing to reduced fertility is use of exogenous hormones and old age. Here, we sequenced RNA from granulosa cell samples from human patients undergoing in vitro fertilization to test how hormonally stimulated cells' transcriptional profiles change with age. All patients were normal responders referred for male infertility and were grouped into two age groups. The gene expression differences in aging were compared to mouse data (previous experiments, see ArrayExpress E-MTAB-13479) and the results are published in the article "Granulosa cell transcription is similarly impacted by superovulation and aging and predicts early embryonic trajectories". Corresponding count tables are available in ArrayExpress (E-MTAB-13496).
Study
EGAS50000000824
Identification of potential blood biomarkers for early diagnosis of Alzheimer���s disease through immune landscape analysis
Mild cognitive impairment (MCI) is a clinical precursor of Alzheimer���s disease (AD). Recent genetic studies have reported on associations between AD risk genes and immunity. Here, we obtained samples and data from 317 AD, 432 MCI, and 107 cognitively normal (CN) subjects and investigated immune-cell type composition and immune clonal diversity of T-cell receptor (TRA, TRB, TRG, and TRD) and B-cell receptor (IGH, IGK, and IGL) repertoires through bulk RNA sequencing. Our prognosis prediction model using the potential blood-based biomarkers for early AD diagnosis, which combined two immune repertoires (IGK and TRA), WDR37, and clinical information, successfully classified MCI patients into two groups, low and high, in terms of risk of MCI-to-AD conversion.
Study
JGAS000532
ML modeling data & code
Rmarkdown code, PDF, and Rdata file to recapitulate the paper's primary figures and machine learning model development.
Dataset
EGAD00001009764
Whole-exome sequencing of matched blood, primary GBM tumours, and patient-derived organoids
Whole-exome sequencing (~250X coverage) of primary GBM tumours and matched patient-derived organoids and normal blood. Samples from two spatially distinct regions of seven tumours from five patients (five primary, two recurrent).
Dataset
EGAD00001007935
DNA Methylomic Profiling of Preeclampsia Across Pregnancy
Preeclampsia (PE) is a hypertensive, multi-system disorder of pregnancy that significantly impacts maternal and infant morbidity/mortality across the globe, as it increases risk of cardiovascular disease and remains a leading killer of women and babies. Despite PE's significant impact on morbidity/mortality, there are no clinically reliable biomarkers that predict PE. DNA methylation, a dynamic regulator of gene expression, represents a mechanism that is known to be impacted by the environment. Because PE stems from a dysfunctional placenta that releases debris into the maternal circulation, we hypothesized that the in-vivo environment created by the dysfunctional placenta will impact DNA methylation in the maternal circulation, and that these blood-based methylation profiles would serve as a systemic biomarker of the maternal response to placental dysfunction. Our overall objective of this pilot study was to longitudinally characterize DNA methylation profiles across the three trimesters of pregnancy in the maternal blood at time points before and after clinically overt PE using a targeted (endoglin and endoglin-related genes) and a discovery-based approach. For this pilot study, 28 normotensive control participants and 28 PE case participants enrolled in the NICHD funded pregnancy study entitled Prenatal Exposures and Preeclampsia Prevention: Mechanisms of Preeclampsia and the Impact of Obesity (PEPP3; P01HD30367) were 1:1 frequency matched on self-reported race, pre-pregnancy BMI, smoking history, and gestational age at sample collection. Methylation data were collected with the Infinium® MethylationEPIC Beadchip. Methylation assay data collection were carried out at Johns Hopkins University Genetic Resources Core Facility, The SNP Center, Baltimore, MD, USA.
Study
phs001937
The acute effects of morning bright light on the human white adipose tissue transcriptome: exploratory post hoc analysis
The circadian rhythm of the central brain clock in the suprachiasmatic nucleus (SCN) is synchronized by light. White adipose tissue (WAT) is one of the metabolic endocrine organs containing a molecular clock, and it is synchronized by the SCN; excess WAT is a risk factor for health issues including type 2 diabetes mellitus (DM2). We hypothesized that bright-light exposure would affect the human WAT transcriptome. Therefore, we analyzed WAT biopsies from two previously performed randomized cross-over trials (trial 1: lean healthy men, and trial 2: men with obesity and DM2). RNA sequencing results showed major group differences between men with obesity and DM2 and lean healthy men, as well as a differential effect of bright light exposure.
Study
EGAS50000001206
UK10K NEURO FSZ
In the UK10K project we propose a series of complementary genetic approaches to find new low frequency/rare variants contributing to disease phenotypes. These will be based on obtaining the genome wide sequence of 4000 samples from the TwinsUK and ALSPAC cohorts (at 6x sequence coverage), and the exome sequence (protein coding regions and related conserved sequence) of 6000 samples selected for extreme phenotypes. Our studies will focus primarily on cardiovascular-related quantitative traits, obesity and related metabolic traits, neurodevelopmental disorders and a limited number of extreme clinical phenotypes that will provide proof-of-concept for future familial trait sequencing. We will analyse directly quantitative traits in the cohorts and the selected traits in the extreme samples, and also use imputation down to 0.1% allele frequency to extend the analyses to further sample sets with genome wide genotype data. In each case we will investigate indels and larger structural variants as well as SNPs, and use statistical methods that combine rare variants in a locus or pathway as well as single-variant approaches. These Finnish schizophrenia samples have been collected from a population cohort using national registers. The entire sample collection consists of 2756 individuals from 458 families of whom 931 are diagnosed with schizophrenia spectrum disorder, each family having at least two affected siblings. 170 families originate from an internal isolate (Kuusamo) with a three-fold lifetime risk for the trait. The genealogy of the internal isolate is well documented and the individuals form a “megapedigree” reaching to the 17th Century. All diagnoses are based on DSM-IV and for a large fraction of cases there is cognitive data. For further details/descriptions with regard to this data set please contact Tiina Paunio (tiina.paunio@thl.fi)
Study
EGAS00001000118
Long-read whole-genome sequencing-based concurrent haplotype phasing and aneuploidy profiling of single cells
This dataset contains long-read whole-genome sequencing (lrWGS) data from 14 samples. Five lrWGS data are from single-cell (sc, sc_2, sc_3), multi-cell (mc, 10 cells), and bulk samples of HG002. The remaining nine lrWGS data are from two preimplantation genetic testing (PGT) families, including four from blood bulk DNA of the parental pairs and five from trophectoderm biopsies of two embryos from one family and three embryos from another family. The data are provided in raw FASTQ format and were generated using the PromethION device from Oxford Nanopore Technologies.
Dataset
EGAD50000000787
Genomics of Circulating Tumor Cells
Comprehensive analyses of cancer genomes in clinical settings promise to inform prognoses and guide the deployment of precise cancer treatments. A major barrier, however, is the inaccessibility of adequate metastatic tissue for accurate genomic analysis in prostate and other cancers. A potential solution is to characterize circulating tumor cells (CTCs), but this requires overcoming multiple technical hurdles. Here, we report an integrated process to isolate, qualify, and sequence whole exomes of CTCs with high fidelity, using a census-based sequencing strategy. Power calculations suggest that mapping of over 99.995% of the territory accessible in bulk exome sequencing is possible in CTCs. We validated our sequencing process in two prostate cancer patients including one for whom we compared CTC-derived mutations to mutations found in a lymph node metastasis and nine cores of the primary tumor. 51 of 73 CTC mutations (70%) were observed in matched tissue. Moreover, we identified 10 early trunk mutations and 56 metastatic trunk mutations in the non-CTC tumor samples and found 90% and 73% of these, respectively, in CTC exomes. This study establishes a foundation for CTC genomics in the clinic.
Study
phs000717
Whole-Genome sequencing of hepatocellular carcinomas
The French ICGC project on liver tumors is coordinated by Pr Jessica Zucman-Rossi and funded by Inca (French Institute for Cancer). The aim of the present project is to identify the catalog of somatic and germline mutations in liver tumors using whole genome and whole exome sequencing together with CGH-SNP, methylome and transcriptomic profiling. For this purpose, a series of 500 liver tumors are collected through the French National Liver Collection and these samples will be analyzed using the different omics technologies. The data will be deposited in the ICGC and EGA database to be publically available for the scientific community. Hepatocellular carcinoma (HCC) accounts for more than 90% of liver cancers, and is a major health problem. It is the 3rd cause of cancer-related mortality. Advances in genomic analyses have formed a comprehensive understanding of different underlying pathobiological layers resulting in hepatocarcinogenesis. Thus, the development of next-generation sequencing technologies has made it possible to generate more comprehensive catalogues of somatic alteration events (single nucleotide substitutions, structural variations, and epigenetic changes) in liver cancer genome than ever before. The dataset will include 50 whole genome sequencing tumor/germline pairs, of which 6 are deposited in February 2014.
Study
EGAS00001000706
Next-Generation Sequencing of AV Nodal Reentry Tachycardia patients
Atrioventricular nodal reentry tachycardia (AVNRT) is the most common form of regular paroxysmal supraventricular tachycardia. This arrhythmia affects women twice as frequently as men, and is often diagnosed in patients below 40 years of age. Familial clustering, early onset of symptoms, and lack of structural anomaly indicate involvement of genetic factors in AVNRT pathophysiology. We hypothesized that AVNRT patients have a high prevalence of variants in genes that are highly expressed in the atrioventricular conduction axis of the heart and potentially involved in arrhythmic diseases. Next-generation sequencing of 67 genes was applied to the DNA profile of 298 AVNRT patients and 10 AVNRT family members using HaloPlex Target Enrichment System.In total, we identified 229 variants in 60 genes; 215 missenses, four frame shifts, four codon deletions, three missense and splice sites, two stop-gain variants, and one start-lost variant. Sixty-five of these were not present in the Exome Aggregation Consortium (ExAC) database. Furthermore, we report two AVNRT families with co-segregating variants. Seventy-five of 284 AVNRT patients (26.4%) and three family members to different AVNRT probands had one or more variants in genes affecting the sodium handling. Fifty-four out of 284 AVNRT patients (19.0%) had variants in genes affecting the calcium handling of the heart. We furthermore find a large proportion of variants in the HCN1-4 genes. We did not detect a significant enrichment of rare variants in the tested genes.This could be an indication that AVNRT might be an electrical arrhythmic disease with abnormal sodium and calcium handling.
Study
EGAS00001002745
Whole exome sequencing data of germline and two independent primary leukemias of five patients
The contribution of genetic predisposing factors to the development of pediatric acute lymphoblastic leukemia (ALL), the most frequently diagnosed cancer in childhood, has not been fully elucidated. Children presenting with multiple de novo leukemias are more likely to suffer from genetic predisposition. Here, we selected five of these patients and analyzed the mutational spectrum of normal and malignant tissues.
Dataset
EGAD00001002266
Lung cancer Early Molecular Assessment
This study aimed to validate the potential utility of a clinically accessible, highly sensitive tumor-informed circulating tumor DNA (ctDNA) assay. The study considers a ‘landmark’ time period (between two weeks and four months after definitive treatment), as well as longitudinal sampling (>2 weeks) and compares two independent, retrospectively collected real-world cohorts.
Study
EGAS50000000896
Transgenerational Transmission of Post-Zygotic Mutations Suggests Symmetric Contribution of First Two Blastomeres to Human Germline
Skin and blood samples were collected from four phenotypically normal individuals (mother, father, and children) from a randomly selected family. Whole genomes of the samples from these individuals were sequenced to a depth of 30X-200X. Sequencing data for the mother was previously analyzed and is available at NDA collection #2961. Sequenced reads from the children and father were aligned to the human reference genome (hg19) generating BAM files for each individual. We used post-zygotic mutations in the mother to trace cell lineages across human generations. Analysis of this family demonstrated that different cell lineages were transmitted to offspring. Coupled with analysis of publicly available data, this result revealed a fundamental difference between soma and germline in lineage allocation and suggested a 50:50 contribution of the first two blastomeres to the germline.
Study
phs003781
De Novo Characterization of Cell-Free DNA Fragmentation Hotspots in Plasma Whole-Genome Sequencing
The raw sequencing data deposited here are low-coverage (~1X) whole-genome sequencing (WGS) of plasma cell-free DNA (cfDNA). The purpose is to validate the performance of early-stage cancer diagnosis in the Zhou et al. 2020 BioRxiv paper, mainly for two types of cancer: early-stage liver cancer and breast cancer. The cases and controls are matched with age, gender, smoking history, and alcohol usage. We developed a computational method to identify the cfDNA fragmentation hotspots from pooled low-coverage cfDNA WGS data. We found the cfDNA fragmentation hotspots are highly enriched in regulatory regions, such as open chromatin regions. The signals from these regions can help the diagnosis of early-stage cancers and their tissues-of-origin.
Study
phs003062
Liquid biopsy for molecular characterization of diffuse large B-cell lymphoma and early assessment of minimal residual disease
Circulating tumor DNA (ctDNA) allows genotyping and minimal residual disease (MRD) detection in lymphomas. Using a NGS approach (Euroclonality-NDC), we evaluated clinical and prognostic value of ctDNA in a series of R-CHOP-treated DLBCL patients at baseline (n=68) and after 2-cycles (n=59), monitored by metabolic imaging (PET/CT).
A molecular marker was identified in 61/68 (90%) ctDNA samples at diagnosis. Pre-treatment high ctDNA levels significantly correlated with elevated LDH, advanced stage, high risk IPI and a trend to shorter 2-year PFS. Valuable NGS data after 2-cycles of treatment were obtained in 44 cases, and 38 achieved major molecular response (MMR; 2.5-log drop in ctDNA). PFS curves displayed statistically significant differences among those achieving MMR vs. those not achieving MMR (2yr PFS of 76% vs. 0%, p<0.001). Similarly, more than 66% reduction in SUVmax by PET/CT identified two subgroups with different prognosis (2yr PFS of 83% vs. 38%; p<0.001). Combining both approaches MMR and SUVmax reduction, a better stratification was observed (2yr PFS of 84% vs. 17% vs. 0%, p<0.001).
Euroclonality-NDC panel allows the detection of a molecular marker in the ctDNA in 90% of DLBCL. ctDNA reduction at 2 cycles and its combination with interim PET results improves patient prognosis stratification.
Study
EGAS50000000215
Chromothripsis in human breast cancer (HIPO K26K/H017/A017)
Chromothripsis is a form of genome instability, by which a presumably single catastrophic event generates extensive genomic rearrangements of one or a few chromosome(s). Widely assumed to be an early event in tumor development, this
phenomenon plays a prominent role in tumor onset. We analyzed chromothripsis in 252 human breast cancers from two patient cohorts (149 metastatic breast cancers, 63 untreated primary tumors, 29 local relapses, 11 longitudinal pairs) using whole-genome and whole-exome sequencing. We showed that chromothripsis affects a substantial proportion of human breast cancers, with a prevalence over 60% in a cohort of metastatic cases and 25% in a cohort comprising predominantly luminal breast cancers (cohorts from HIPO K26K and H017 and A017). In the vast majority of cases, multiple chromosomes per tumor are affected, with most chromothriptic events on chromosomes 11 and 17 including, among other significantly altered drivers, CCND1, ERBB2, CDK12 and BRCA1. Importantly, chromothripsis generates recurrent fusions that drive tumor development. Chromothripsis-related rearrangements are linked with univocal mutational signatures, with clusters of point mutations due to kataegis in close proximity to the genomic breakpoints, and with the activation of specific signaling pathways. Analysis of the temporal order of events in tumors with and without chromothripsis as well as longitudinal analysis of chromothriptic patterns in tumor pairs revealed important insights on the role of chromothriptic chromosomes in tumor evolution.
Study
EGAS00001004662