European Genome-phenome Archive 15th Anniversary Celebration
2023 marks the 15th Anniversary of the EGA, jointly managed by the European Bioinformatics Institute (EMBL-EBI) and the Centre for Genomic Regulation (CRG). To mark the occasion, a simultaneous celebration was held in both institutions on the 13th of December 2023. The teams gathered online to play a quiz game and celebrate all the achievements and milestones with two wonderful anniversary cakes.
In 2008, the European Genome-phenome Archive was created at the EBI-EMBL in Cambridge to guarantee that human genome and phenome data were available to the international scientific community while data privacy was preserved.
The six-person staff at the beginning of the project is now far behind us, with a team that reaches the number of 35 members. Since 2013, the European Bioinformatics Institute and the Centre for Genomic Regulation share responsibility for The European Genome-phenome Archive (EGA). At that time, the EGA had data for about 0,5 petabytes. Currently, the Archive contains more than 12 petabytes.
2023 comes to an end with several good news. The EGA has been renewed as an ELIXIR Core Data Resource. This was announced during the GA4GH 11th Plenary held in San Francisco last September when ELIXIR-Beacon was also confirmed to be a GA4GH Driver Project. Thus, the Beacon API has maintained this title since 2018. What’s more, the first Federated EGA dataset is now live on our website.
This year we also launched new services for EGA users in September. By the numbers, in 2023 the EGA counts 2.5 PB archived, 371 studies published, 208 new submitters and 19 active projects in which the team is participating, among others.
We look forward to continuing to support your research in 2024!
Blog
15-anniversary
HuBMAP: A 3-D Tissue Map of the Human Lymphatic System
The major goal of the 3-D Tissue Map of the Human Lymphatic System is to use microscopic and biomolecular procedures to facilitate co-registration pipelines and common 3D reconstruction algorithms. Tissue collected from human spleen, thymus and lymph node will be spatially resolved at the single cell level both within and across individuals. The approach employed involves sequencing of transcriptomes of dissociated cells and mapping to histological sections using CO-Detection by indEXing (CODEX) and /or Imaging mass spectroscopy, two highly multiplexed methods employing antibody-tagged target epitopes. Additionally, light sheet fluorescent microscopy is used to provide a higher level context for structural localization on a larger volume. The molecular data provided by this project is obtained through single-cell RNA-seq.
Study
phs002268
Molecular Genetics of Schizophrenia - nonGAIN Sample (MGS_nonGAIN)
This study is part of the Molecular Genetics of Schizophrenia (MGS) genome wide association study (GWAS) of 3,972 cases
(2,686 EA and 1,286 AA) and 3,629 controls (2,656 EA and 973 AA) (analyzed sample remaining after quality control exclusions),
comprised of European ancestry (EA) and African American (AA) samples. We genotyped about half of the EA sample and almost all
of the AA sample under the auspices of the Genetic Association Information Network (GAIN) with the Affymetrix 6.0 platform at the
Broad Institute. The remainder of the included sample was also genotyped with the Affymetrix 6.0 platform at the Broad Institute,
and we refer to this component as the nonGAIN sample. Cases met criteria for schizophrenia (SCZ) or schizoaffective disorder
(SA) per the Diagnostic and Statistical Manual of Mental Disorders version IV (DSM-IV) for all three collections (SGI, MGS1,
and MGS2) comprising these cases. However, for the older SGI collection, codes for the secondary diagnoses refer to the older
DSM-III-R version. Controls were screened briefly and excluded if they endorsed a history of these illnesses.
Study
phs000167
Next Generation Mendelian Genetics: Malignant Hyperthermia
Malignant hyperthermia (MH) is a genetic disorder that causes a profound metabolic derangement following exposure to certain anesthetics. While approximately half of all cases are associated with ryanodine receptor-1 gene (RYR1) mutations, many cases have an unknown genetic cause. We sought to identify rare variants in novel MH candidate genes by sequencing the protein-coding regions of the genomes of individuals whose disease was either ruled in or out by the gold-standard diagnostic test. We also carefully selected individuals from well-characterized families to use gene-sharing information and maximize efficiency in the study design. Exome sequencing has helped identify the causes of over a dozen Mendelian disorders, has high power at low sample sizes, and is cost-efficient compared to whole-genome sequencing.
Study
phs000405
Longitudinal Study of Urea Cycle Disorders
Individuals with Urea Cycle Disorders (UCD) cannot remove ammonia, a waste product, from the blood. The purpose of this study is to conduct a longitudinal investigation of the natural history, morbidity, and mortality in people with UCD. This study will look at how people with a UCD grow and develop over time and how often they get sick. The research questions are: What is the prevalence of specific morbid indicators of disease severity, including hyperammonemia, developmental disabilities, and various long-term kidney and liver effects? What is the fatality rate associated with the various forms of UCD? What are the correlations between various biomarkers and disease severity and progression? What is the safety and efficacy of currently used and new UCD therapies? This is a longitudinal study of individuals with urea cycle disorders. Those participating in this study will be evaluated every three to twelve months, depending age and time of diagnosis. Participants two years of age and younger that were diagnosed with UCD within the first four weeks of life will be evaluated every three months. Those who are over two years of age or were diagnosed after four weeks of age will be evaluated every six months. Participants older than 18 years of age will be evaluated once every year.
Study
phs000577
STAMPEED: Cardiovascular Health Study (CHS) GWAS to identify genetic variants associated with aging and CVD risk factors and events
The primary aim of the study is to conduct a genome-wide association study to identify genetic variants associated with the incidence of myocardial infarction (MI), stroke, and heart failure (HF) among participants enrolled in the Cardiovascular Health Study (CHS) who were free of clinical cardiovascular disease at baseline. The secondary aim is to conduct genome-wide association study of other phenotypes in CHS. The study is an ancillary study to CHS. CHS is a population-based cohort study of risk factors for heart disease and stroke among older adults recruited at 4 US sites in 1989-1990. Subjects underwent an extensive baseline examination, and annual follow-up examinations through 1988-1999.
Study
phs000226
Genetics of Male Infertility Initiative (GEMINI)
The Genetics of Male Infertility Initiative (GEMINI) project is a multi-center study designed to discover and characterize genetic variants conferring risk for male infertility. Patients enrolled in GEMINI have been clinically diagnosed with male infertility. Patients are recruited from multiple medical centers and are primarily of European ancestry. DNA from each patient is extracted from peripheral blood or saliva and used for exome sequencing. Genetic data generated by the GEMINI project has contributed to the discovery of over a dozen genes involved in human infertility. A VCF file of genetic data from GEMINI patients that have been appropriately consented will be available through dbGaP.
Study
phs003115
NIMH (National Institute of Mental Health) De Novo Mutation Identification in Taiwanese Schizophrenia Trios
The substantial reproductive impact of schizophrenia, for which affected individuals have fewer than half as many offspring as unaffected individuals do, implies that mutations of largest effect will frequently be de novo mutations. Ascertaining exome sequence variation in father-mother-offspring trios allows such mutations to be identified and distinguished from the far-larger amount of rare variation that is inherited by each individual. The pursuit of this approach in a large, well-powered cohort of trios can also provide lessons that inform the development of such gene discovery strategies more generally in human genetics. Schizophrenia trios from the Taiwanese population are being collected by Dr. Ming Tsuang (PI, UC San Diego, California) and investigators in Taiwan (PI, Dr. Hai Gwo Hwu; both funded by NIMH grant 1R01MH085560; Expanding Rapid Ascertainment Networks of Schizophrenia Families in Taiwan). A total of 3800 trios are anticipated to be collected by May 2013. This represents a highly homogenous national sample from the same ancestral population. DNA samples will be obtained from the NIMH Repository, Rutgers University Cell and DNA Repository (described below) and stored at the Broad Institute. Genetic and data analyses will be performed at the Broad Institute. We propose to sequence the whole exome of trios by hybrid capture and Illumina next generation sequencing and perform targeted genotyping and validation of variants (SNPs, indels and CNVs) using several molecular methods, to include emulsion-based PCR and Sanger sequencing.
Study
phs001196
SUDC Registry and Research Collaborative
The Sudden Unexplained Death in Childhood Registry and Research Collaborative (SUDCRRC), created at NYU Langone Health in 2014, connects academic investigators, forensic pathologists, and medical examiner and coroner partner offices to study sudden unexpected pediatric deaths, aged 1 month to 18 years, whose death is unexplained or unclear at the end of the public mandated investigation, including autopsy. Each case includes collection and analysis of all records (medical history from prenatal records throughout life, death scene investigation, autopsy/toxicology/ancillary death investigation reports including photographs and histology slides), biospecimens, and in-depth family interviews. Whole exome sequencing is performed on each deceased child and both biological parents. Additional studies are pursued when indicated by pathology review. Each case receives a multidisciplinary masked review process and parental written consent is obtained. The NYU institutional review board approved this study.
Study
phs003383
Facial Skin Biophysical Multi-Parameter and Microbiome-Based Korean Skin Cutoype (KSC) Determination
In this study, we conducted an integrated analysis of skin measurements, clinical BSTI surveys, and the skin microbiome of 950 Korean subjects to examine the ideal skin microbiome-biophysical association. By utilizing four skin biophysical parameters, we identified four distinct Korean Skin Cutotypes (KSCs) and categorized the subjects into three aging groups based on their age distribution. We established strong connections between 15 core genera and the four KSC types within the three aging groups, revealing three prominent clusters of the facial skin microbiome. Together with skin microbiome variations, skin tone/elasticity distinguishes aging groups while oiliness/hydration distinguishes individual differences within aging groups. Our study provides prospective reality data for customized skin care based on the microbiome environment of each skin type.
Study
EGAS00001007334
Metastatic Adult Pancreatoblastoma: Multimodal Treatment and Molecular Characterization of a Very Rare Disease (NCT MASTER)
Pancreatoblastoma is a rare malignant tumor that occurs predominantly in children. We identified four adult patients with metastasizing pancreatoblastoma at a high-volume German university cancer center which were treated with multimodal therapies between 2013 and 2018. In three cases, we performed a comprehensive molecular analysis that included whole-genome sequencing (WGS) or whole-exome sequencing (WES); transcriptome sequencing was performed in two cases, respectively. Subsequent recommendations of molecularly stratified treatment options were discussed within a dedicated molecular tumor board (MTB) embedded in a precision oncology program (NCT MASTER).
Study
EGAS00001004157
NHLBI TOPMed: The Jackson Heart Study (JHS)
Since there is a greater prevalence of cardiovascular disease among African Americans, the purpose of the Jackson Heart Study (JHS) is to explore the reasons for this disparity and to uncover new approaches to reduce it. The JHS is a large, community-based, observational study whose 5306 participants were recruited from among the non-institutionalized African-American adults from urban and rural areas of the three counties (Hinds, Madison, and Rankin) that make up the Jackson, MS, metropolitan statistical area (MSA). Jackson is the capital of Mississippi, the state with the largest percentage (36.3%) of African Americans in the United States. The JHS design included participants from the Jackson ARIC study who had originally been recruited through random selection from a drivers' license registry. Approximately six months before the JHS was to begin, an amendment to the federal Driver's Privacy Protection Act was passed that changed the level of consent for public release of personal information from driver's license lists from an "opt out" to an "opt in" basis. The Mississippi Highway Patrol was no longer able to release a complete listing of all persons with driver's licenses or state identification cards, which prevented its use in the JHS. New JHS participants were chosen randomly from the Accudata America commercial listing, which provides householder name, address, zip code, phone number (if available), age group in decades, and family components. The Accudata list was deemed to provide the most complete count of households for individuals aged 55 years and older in the Jackson MSA. A structured volunteer sample was also included in which demographic cells for recruitment were designed to mirror the eligible population. Enrollment was opened to volunteers who met census-derived age, sex, and socioeconomic status (SES) eligibility criteria for the Jackson MSA. In addition, a family component was included in the JHS. The sampling frame for the family study was a participant in any one of the ARlC, random, or volunteer samples whose family size met eligibility requirements. Eligibility included having at least two full siblings and four first degree relatives (parents, siblings, children over the age of 21) who lived in the Jackson MSA and who were willing to participate in the study. No upper age limit was placed on the family sample. Known contact information was obtained during the baseline clinic examination from the index family member with a verbal pedigree format to identify name(s), age(s), address (es), and telephone number(s). Recruitment was limited to persons 35-84 years old except in the family cohort, where those 21 years old and above were eligible. Only persons who otherwise met study criteria but were deemed to be physically or mentally incompetent by trained recruiters were excluded from study eligibility.1 1 Wyatt SB, Diekelmann N, Henderson F, Andrew ME, Billingsley G, Felder SH et al. A community-driven model of research participation: the Jackson Heart Study Participant Recruitment and Retention Study. Ethn Dis 2003; 13(4):438-455 (PMID: 14632263).
Study
phs000964
Children's Hospital of Philadelphia (CHOP) Control Copy Number Variation (CNV) Study
We present a database of copy number variations (CNVs) detected in 2,026 disease-free individuals, using high-density, SNP-based
oligonucleotide microarrays. This large cohort analyzed for CNVs in a single study using a uniform array platform and
computational tools, comprises mainly of Caucasians (65.2%) and African-Americans (34.2%), We have catalogued and
characterized 54,462 individual CNVs, 77.8% of which were identified in multiple unrelated individuals. These non-unique CNVs
mapped to 3,272 distinct regions of genomic variation spanning 5.9% of the genome; 51.5% of these were previously unreported,
and >85% are rare. Our annotation and analysis confirmed and extended previously reported correlations between CNVs and several
genomic features such as repetitive DNA elements, segmental duplications and genes. We demonstrate the utility of this data set
in distinguishing CNVs with pathologic significance from normal variants. Together, this analysis and annotation provides a
useful resource to assist with the assessment of CNVs in the contexts of human variation, disease susceptibility, and clinical
molecular diagnostics. The CNV resource is available at: http://cnv.chop.edu.
Reprinted from Shaikh T., et al., High-Resolution Mapping and Analysis of Copy Number Variations in the Human Genome: A Data
Resource for Clinical and Research Applications Genome Research. 2009, with permission from Genome Research.
CHOP CNVs from 2,026 disease-free individuals are available through dbVar at
http://www.ncbi.nlm.nih.gov/dbvar/studies/nstd21.
Study
phs000199
GENETIC HISTORY OF ITALY
Recent scientific literature has highlighted the relevance of population genetic studies both for disease association-mapping in admixed populations and for understanding the history of human migrations. Deeper insight into the history of Italian population is critical for understanding the peopling of Europe. Because of its crucial position at the centre of the Mediterranean basin, the Italian peninsula has experienced a complex history of colonization and migration, whose genetic signatures are still present in contemporary Italians. In this study, we investigated genomic variation in the Italian population using 2.5 million single nucleotide polymorphisms (SNPs) in a sample of more than 300 unrelated Italian subjects with well-defined geographical origins. We combined several analytical approaches to interpret genome-wide data on 1,272 individuals from European, Middle Eastern, and North African populations. We detected three major ancestral components contributing different proportions across the Italian peninsula, and signatures of continuous gene flow within Italy, which has produced remarkable genetic variability among contemporary Italians. In addition, we have extracted novel details about the Italian population’s ancestry: we identified genetic signatures of major historical events in Europe and the Mediterranean basin from the Neolithic (e.g., peopling of Sardinia) to recent times (e.g., ‘barbarian invasion’ of Northern and Central Italy). These results are valuable for further genetic, epidemiological and forensic studies in Italy and in Europe.
Study
EGAS00001001458
Full characterization of structural variation
Structural variants (SVs) are important contributors to human disease. Their characterization remains however difficult due to their size and association with repetitive regions. Long-read sequencing (LRS) and optical genome mapping (OGM) can aid as their molecules span multiple kilobases and capture SVs in full. In this study, we selected six individuals who presented with unresolved SVs. We applied LRS onto all individuals and OGM to a subset of three complex cases. LRS detected and fully resolved the interrogated SV in all samples. This enabled a precise molecular diagnosis in two individuals. Overall, LRS identified 100% of the junctions at single-basepair level, providing valuable insights into their formation mechanisms without need for additional data sources. Application of OGM added straightforward variant phasing, aiding in the unravelment of complex rearrangements. These results highlight the potential of LRS and OGM as follow-up molecular tests for complete SV characterization. We show that they can assess clinically relevant structural variation at unprecedented resolution. Additionally, they detect (complex) cryptic rearrangements missed by conventional methods. This ultimately leads to an increased diagnostic yield, emphasizing their added benefit in a diagnostic setting. To aid their rapid adoption, we provide detailed laboratory and bioinformatics workflows in this manuscript.
Study
EGAS50000000520
This project aims to study human memory capacity, including short-term memory and long-term memory, systematically via genome-wide association studies
Development of high-throughput genotyping platforms provides an opportunity to identify new genetic elements related to complex cognitive functions. Taking advantage of multi-level genomic analysis, here we studied the genetic basis of human short-term (STM, n=1,620) and long-term (LTM, n=1,526) memory functions. Heritability estimation based on single nucleotide polymorphism showed moderate heritability of short-term memory but very low heritability of long-term memory. In a two-step genome-wide association study, the markers rs13151012 and rs1558360 passed genome-wide significance (p < 5×10-8) in digit-span STM task and for the first principal component shared by two STM tasks; however, none of them survived the replication. In turn, we selected the ten most significant single nucleotide polymorphisms (SNPs) for replication tests. Among them, a SNP near ZFAT was significantly associated with STM performance in another independent population of 2,789; a polymorphism within BCAT2 was significantly associated with LTM in another independent population of 1,865. Furthermore, we performed a pathway analysis based on the current genomic data and found six pathways significantly associated with STM capacity and one pathway associated with LTM capacity.
Study
EGAS00001002875
Integrated Genomic Analysis of Chronic Lymphocytic Leukaemia
Chronic lymphocytic leukaemia (CLL) is a frequent and heterogeneous disease whose genetic alterations determining the clinicobiological behaviour are not fully understood. Here, we describe a comprehensive evaluation of the genomic landscape of 452 CLLs and 54 monoclonal B-lymphocytosis (MBL), a precursor disorder. This study provides an integrated portrait of the genomic landscape of CLL, identifies new recurrent mutations acting as drivers of the disease, and suggests clinical interventions which may improve the management of patients with this neoplasia.
The data here included was described in the ICGC-CLL manuscript published by Puente et al (Nature 2015). Note that complementary RNA-seq data generated within the IGCG-CLL can be found in EGAS00001000374.
Study
EGAS00001001306
The Haemgen RBC study
Anaemia is a major determinant of global ill-health. To refine our understanding of the genetic factors influencing red blood cell formation and function, we carried out a meta-analysis of genome-wide association studies (GWAS) for six red blood cell traits: haemoglobin (HB), mean cell haemoglobin (MCH), mean cell haemoglobin concentration (MCHC), mean cell volume (MCV), packed cell volume (PCV) and red blood cell count (RBC). We provide genome-wide association results for 62,553 people of European ancestry using up to 2,644,161 autosomal SNPs. Participants with extreme measurements (>+/-3SD from mean) were excluded on a per phenotype basis. Imputation was done using haplotypes from HapMap Phase 2. SNP associations with each phenotype were tested by linear regression using an additive genetic model. Associations were tested separately in each cohort, with principal components and other study specific factors as covariates to account of population substructure. We then carried out meta-analysis of results from the individual cohorts using z-scores weighted by square root of sample size. SNPs with MAF<1% (weighted average across cohorts) were removed, as were SNPs with weight <50% of phenotype sample size.
Anaemia is a major determinant of global ill-health. To refine our understanding of the genetic factors influencing red blood cell formation and function, we carried out a meta-analysis of genome-wide association studies (GWAS) for six red blood cell traits: haemoglobin (HB), mean cell haemoglobin (MCH), mean cell haemoglobin concentration (MCHC), mean cell volume (MCV), packed cell volume (PCV) and red blood cell count (RBC). We provide genome-wide association results for 62,553 people of European ancestry using up to 2,644,161 autosomal SNPs. Participants with extreme measurements (>+/-3SD from mean) were excluded on a per phenotype basis. Imputation was done using haplotypes from HapMap Phase 2. SNP associations with each phenotype were tested by linear regression using an additive genetic model. Associations were tested separately in each cohort, with principal components and other study specific factors as covariates to account of population substructure. We then carried out meta-analysis of results from the individual cohorts using z-scores weighted by square root of sample size. SNPs with MAF<1% (weighted average across cohorts) were removed, as were SNPs with weight <50% of phenotype sample size.
Study
EGAS00000000132
RNA-seq of Glioblastoma stem cells
Chromatin accessibility discriminates stem from mature cell populations, enabling the identification of primitive stem-like cells in primary tumors, such as Glioblastoma (GBM) where self-renewing cells driving cancer progression and recurrence are prime targets for therapeutic intervention. We show, using single-cell chromatin accessibility, that primary GBMs harbor a heterogeneous self-renewing population whose diversity is captured in patient-derived glioblastoma stem cells (GSCs). In depth characterization of chromatin accessibility in GSCs identifies three GSC states: Reactive, Constructive, and Invasive, each governed by uniquely essential transcription factors and present within GBMs in varying proportions. Orthotopic xenografts reveal that GSC states associate with survival, and identify an invasive GSC signature predictive of low patient survival. Our chromatin-driven characterization of GSC states improves prognostic precision and identifies dependencies to guide combination therapies.
Study
EGAS00001003070
Dysregulation of Naive T Cell Quiescence during Aging
TEA-seq (novel trimodal single-cell analysis of mRNA transcripts, surface protein epitopes, and chromatin accessibility) was performed on CD3+ T cells isolated from peripheral blood of healthy pediatric (11–13 yrs, n = 8) and older adult (55–65 yrs, n = 8) female donors. Single-cell RNA sequencing (scRNA-seq) was additionally performed on peripheral blood mononuclear cells (PBMCs) from a healthy cohort of 16 pediatric, 16 young adult (25–35 yrs), and 16 older adult donors with equal sex distribution. Using these data, we dissected the compositional and molecular alterations within the T cell compartment across the spectrum of healthy age, showing broad transcriptional and epigenetic alterations within the T cell compartment of older adults compared to children, as well as a novel population of pediatric-specific CD8aa T cells.
Study
phs003400
Exome Recapture and Sequencing of Prospectively Characterized Clinical Specimens From Cancer Patients
The study cohort comprises samples from patients whose tumors have been prospectively characterized as part of their care at Memorial Sloan Kettering Cancer Center. This includes a variety of distinct tumor type and molecular subtypes.
Study
phs001783
Genome-Wide associations of Lung Health Study (LHS)
The 'Genome-Wide Associations Environmental Interactions in the Lung Health Study' at Johns Hopkins University aims to test for association between lung function decline as a primary outcome associated with chronic obstructive pulmonary disease (COPD) using banked DNA and phenotype data on 4,287 European Americans from the longitudinal, multicenter Lung Health Study (LHS). The broad goals of the LungGO/ESP-GO falls into two general categories: (i) discovery of all variants (i.e., common and rare) in all protein-coding regions of the human genome (i.e., the exome) conferring risk to complex pulmonary diseases including COPD, in a subset of the LHS cohort. The Johns Hopkins University LHS cohort offers a unique opportunity to elucidate genetic variants that cause COPD. The Lung Health Study I was a randomized multicenter clinical trial with 5887 participants carried out from October 1986 to April 1994, designed to test the effectiveness of smoking cessation and bronchodilator administration in smokers aged 35 to 60 with mild lung function impairment. Participants were randomly assigned to one of three groups: usual care, who received no intervention smoking intervention with the inhaled bronchodilator ipratroprium bromide smoking intervention with an inhaled placebo. The effect of intervention was evaluated by the rate of decline of forced expiratory volume in one second (FEV1). For the GWAS, only the subset of European American LHS participants for whom lung function data from three time points or more are available. Thus, the GWAS represents 73% of the 5,887 volunteers who participated in the LHS study. Importantly, LHS subjects included had similar demographics (including age, gender and BMI) and rates of lung function decline (mean annual change in FEV1% predicted: -0.96 %/yr vs. -0.99 %/yr, p=0.57) compared with those not included in the GWAS, reflecting little selection bias for our primary outcome. They were, however, more likely to have quit smoking after 5 years. This study is part of the Gene Environment Association Studies initiative (GENEVA, http://www.genevastudy.org) funded by the trans-NIH Genes, Environment, and Health Initiative (GEI). The overarching goal is to identify novel genetic factors that contribute to lung function through large-scale genome-wide association studies of smokers enrolled in a multicenter clinical trial. Genotyping was performed at the Johns Hopkins University Center for Inherited Disease Research (CIDR). Data cleaning and harmonization were done at the GEI-funded GENEVA Coordinating Center at the University of Washington.
Study
phs000335
Genome-wide DNA methylation sequencing identifies epigenetic perturbations in the upper airways under long-term exposure to moderate levels of ambient air pollution
While the link between exposure to high levels of ambient particulate matter (PM) and increased incidences of respiratory and cardiovascular diseases is widely recognized, recent epidemiological studies have shown that low PM concentrations are equally associated with adverse health effects. As DNA methylation is one of the main mechanisms by which cells regulate and stabilize gene expression, changes in the methylome could constitute early indicators of dysregulated signaling pathways. So far, little is known about PM-associated DNA methylation changes in the upper airways, the first point of contact between airborne pollutants and the human body. Here, we focused on cells of the upper respiratory tract and assessed their genome-wide DNA methylation pattern to explore exposure-associated early regulatory changes. Using a mobile epidemiological laboratory, nasal lavage samples were collected from a cohort of 60 adults that lived in districts with records of low (Simmerath) or moderate (Stuttgart) PM10 levels in Germany. PM10 concentrations were verified by particle measurements on the days of the sample collection and genome-wide DNA methylation was determined by enzymatic methyl sequencing at single-base resolution. We identified 231 differentially methylated regions (DMRs) between moderately and lowly PM10 exposed individuals. A high proportion of DMRs overlapped with regulatory elements, and DMR target genes were involved in pathways regulating cellular redox homeostasis and immune response. In addition, we found distinct changes in DNA methylation of the HOXA gene cluster whose methylation levels have previously been linked to air pollution exposure but also to carcinogenesis in several instances. The findings of this study suggest that regulatory changes in upper airway cells occur at PM10 levels below current European thresholds, some of which may be involved in the development of air pollution-related diseases.
Study
EGAS00001007374
NHLBI TOPMed: Best ADd-on Therapy Giving Effective Response (BADGER)
BADGER is a 56-week randomized, double-blind, three-treatment, three-period cross-over trial that will evaluate the differential improvement in control that is achieved following three separate treatment interventions in children whose asthma is not acceptably controlled on a low dose of ICS (per NAEPP guidelines). All participants will enter an 8-week run-in period during which time they will receive a dose of 1x ICS (fluticasone 200 μg/day). During this 8-week time period, running 2-week averages to establish the lack of acceptable asthma control will be calculated. Thus, a child could qualify for randomization at any time during this 8-week run-in period. This approach should maximize both patient safety and successful enrollment. Children will continue to receive 1x ICS during the entire treatment phase. During each period of the treatment phase, they also will receive one add-on therapy in the form of LABA, LTRA or additional 1x ICS. The order of the add-on therapy assignment will be determined by randomization into one of six treatment sequences (order determined randomly). Each treatment period will be 16 weeks in length; the initial 4 weeks of each period will be considered to be the washout period for the previous treatment. The primary outcome measures will be frequency of asthma exacerbations, asthma control days, and FEV1.
Study
phs001728
Total exRNA Profiles from Plasma, Saliva, and Urine of Healthy Subjects
We sequenced small RNAs from 183 plasma samples, 204 urine samples and 46 saliva samples from 55 college athletes ages 18-25 years. Many of the participants provided more than one sample, weeks or months apart, allowing us to assess variability in an individual's exRNA expression levels over time. Several individuals provided all three biofluid types at one time, producing data on individual expression levels across several biofluid types. Here we provide a systematic analysis of small exRNAs present in each biofluid, as well as an analysis of exogenous RNAs. We find that a large number of RNA fragments in plasma (63%) and urine (54%) have sequences that are assigned to YRNA and tRNA fragments respectively. Surprisingly, while many miRNAs can be detected, there are few miRNAs that are consistently detected in all samples from a single biofluid. Additionally, we performed whole transcriptome sequencing on 134 plasma and 115 urine samples and identified circRNA.
Study
phs001258
HuBMAP: Single-Cell Data from Human Tissues
The small bowel and colon are organs critical for maintaining homeostasis of the human body by mediating nutritional absorption upon the ingestion of food. Though both organs are extensive in length, there are known differences in function and cellular heterogeneity within different portions of each. Also, a cross section anywhere in the bowel reveals a complex layering of components involved in absorption and secretion, motility of gut contents, circulation, and immunity. We are mapping the complexity of the the small bowel and colon with cell-to-cell resolution in histologic sections, both along their lengths and across multiple individuals. Our molecular data consist of single cell ATAC-seq and RNA-seq. These profiles are spatially mapped to histologic sections using CO-Detection by indEXing (CODEX), a highly multiplexed system of antibody-tagged target epitopes.
Study
phs002272
Northwestern NUgene Project: Type 2 Diabetes
The ability to correlate genetic variation with disease susceptibility and response to drug therapy depends on genotype or sequence analysis of large numbers of richly characterized DNA samples. Eight years agoWe are a part of NHGRI's electronic Medical Records and Genomics (eMERGE) Network, whose goal is to conduct genome-wide association studies in thousands of individuals using EMR-derived phenotypes and DNA from linked biorepositories. For eMERGE, Northwestern University (NU) is studying type 2 diabetes as a phenotype. In addition, in order to explore race differences in the prevalence of type 2 diabetes, NU collaborated with Vanderbilt University to study a mix of both Caucasian and African-Americans. Northwestern University: In 2002, Northwestern committed to the development of a DNA repository to serve as a platform for the identification and validation of genotype-phenotype associations that will impact healthcare. The NUgene Project is a repository with longitudinal medical information from participating patients at affiliated hospitals and outpatient clinics from the Northwestern University Medical Center. Participants' DNA samples are coupled with data from a questionnaire (2 versions were used, 1 before and 1 after February 2006, both are included) and continuously updated data from our Electronic Medical Record (EMR) representing actual clinical care events. Northwestern has a state-of-the art, comprehensive inpatient and outpatient EMR system of over 2 million patients. NUgene has broad access to participant data for all outpatient visits as well as inpatient data via a consolidated data warehouse. NUgene participants consent to distribution and use of their coded DNA samples and data for a broad range of genetic research by third-party investigators. Vanderbilt University: BioVU, Vanderbilt's DNA databank, is an enabling resource for exploration of the relationships among genetic variation, disease susceptibility, and variable drug responses, and represents a key first step in moving the emerging sciences of genomics and pharmacogenomics from research tools to clinical practice. BioVU acquires DNA from discarded blood samples collected from routine patient care. The biobank is linked to de-identified clinical data extracted from Vanderbilt's EMR, which forms the basis for phenotype definitions used in genotype-phenotype correlations.
Study
phs000237
HIV-phyloTSI: Subtype-independent estimation of time since HIV-1 infection for cross-sectional measures of population incidence using deep sequence data
Estimating the time since HIV infection (TSI) at population level is essential for tracking changes in the global HIV epidemic. Most methods for determining TSI give a binary classification of infections as recent or non-recent within a window of several months, and cannot assess the cumulative impact of an intervention. We developed a Random Forest Regression model, HIV-phyloTSI, which combines measures of within-host diversity and divergence to generate continuous TSI estimates directly from viral deep-sequencing data, with no need for additional variables. HIV-phyloTSI provides a continuous measure of TSI up to 9 years, with a mean absolute error of less than 12 months overall and less than 5 months for infections with a TSI of up to a year. It performs equally well for all major HIV subtypes based on data from African and European cohorts. We demonstrate how HIV-phyloTSI can be used for incidence estimates on a population level.
Study
EGAS50000000895
NIDDK⁄CIDR Inflammatory Bowel Disease Genetics Consortium (IBDGC) Genome Wide Association Study in Familial Crohn's Disease
The National Institute of Diabetes and Digestive and Kidney Diseases Inflammatory Bowel Disease Genetics Consortium (NIDDK - IBDGC) conducted a genome wide association study (GWAS) using the Human Omni2.5-Quad beadchip from Illumina to identify disease variants associated with familial Crohn's disease. This study includes a subset of individuals from an original cohort of affected subjects and affected/unaffected relatives from a previous linkage scan which showed increased linkage evidence at three novel risk loci. A total of 708 samples were selected based on various criteria for inclusion from the 6 Genetic Research Centers (GRC) in North America that participate in the IBDGC. This project was supported by an ancillary R01 grant from the NIDDK with resources made available through the IBDGC.
Study
phs000367
Single cell analyses of transcriptome and epigenome in neuroblastoma infiltrated bone marrow
We report transcriptional and genome-wide ATAC-seq analyses at single cell resolution of 16 bone marrow samples with and without neuroblastoma infiltration. Our samples span across several neuroblastoma models, including MYCN amplified, ATRX mutated, and sporadic (lacking either alteration).
Study
EGAS00001006106
Sequencing and analysis of a South Asian-Indian personal genome
India with over 1.3 billion people is estimated to carry three times more genetic diversity compared to Europe. Next-generation sequencing technologies have facilitated the understanding of diversity by enabling whole genome sequencing at greater speed and lower cost. While multiple genomes from people of European and a few of Asian descent have been sequenced, only recently a single male genome from the Indian subcontinent at sufficient depth and coverage was published. We have in this study sequenced and analyzed the genome of a South Asian Indian female (SAIF) from the Indian state of Kerala. We identified over 3.4 million SNPs in this genome including over 89,873 private variations. Comparison of the SAIF genome with several published personal genomes revealed that this individual shared ~50% of the SNPs with each of these genomes. Analysis of the SAIF mitochondrial genome showed that it is closely related to the U1a3 haplogroup which has been previously observed in Kerala. We assessed the SAIF genome for SNPs with health and disease consequences and found that the individual was at a higher risk for multiple sclerosis and a few other diseases. In assessing SNPs that modulate drug response we found a variation that predicts a favorable response to metformin used for treating diabetes. SNPs predictive of adverse reaction to warfarin indicated that the SAIF individual is not at risk for bleeding if treated with typical doses of warfarin. In addition to these, we report the presence of several other SNPs of medical relevance.
Study
EGAS00001000328
Proteom characterization in primary colorectal cancer and corresponding liver metastasis
Colorectal adenocarcinomas (CRC) are one of the most commonly diagnosed tumors worldwide. Colorectal adenocarcinomas primarily metastasize into the liver and (less often) into the peritoneum. Patients suffering from CRC-liver metastasis (CRC-LM) typically present with a dismal overall survival compared to non-metastasized CRC patients. The metastasis process and metastasis promoting factors in patients with CRC are under intensive debate. However, CRC studies investigating the proteome biology are lacking. Formalin-fixed paraffin-embedded (FFPE) tissue specimens provide a valuable resource for comprehensive proteomic studies of a broad variety of clinical malignancies. The presented pilot study compares the proteome of primary CRC and patient-matched CRC-LM. The applied protocol allows a reproducible and straightforward identification and quantification of over 2,600 proteins within the dissected tumorous tissue. Subsequent unsupervised clustering reveals distinct proteome biologies of the primary CRC and the corresponding CRC-LM. Statistical analysis yields multiple differentially abundant proteins in either primary CRC or their corresponding liver metastases. A more detailed analysis of dysregulated biological processes suggests an active immune response in the liver metastases, including several proteins of the complement system. Proteins with structural roles, e.g. cytoskeleton organization or cell junction assembly appear to be less prominent in liver metastases as compared to primary CRC. Immunohistochemistry corroborates proteomic high expression levels of metabolic proteins in CRC-LM. We further assessed how the in vitro inhibition of two in CRC-LM enriched metabolic proteins affected cell proliferation and chemosensitivity. The presented proteomic investigation in a small clinical cohort promotes a more comprehensive understanding of the distinct proteome biology of primary CRC and their corresponding liver metastases.
Study
EGAS00001005641
International Multi-Center ADHD Gene Project (IMAGE) II Case Sample
This sample represents a collection of cases across a range of sites. All of these samples were ascertained for ADHD with most meeting criteria for combined type ADHD. The collection sites span Europe and America. Further details on the source and inclusion and exclusion information can be found in Neale et al. "Case-Control Genome-Wide Association of Attention-Deficit / Hyperactivity Disorder" J Am Acad Child Adolesc Psychiatry. 2010 September; 49(9): 906-920 PMID20732627. For online access to this manuscript see: PMC2928577.
Study
phs000407
Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist (TOPCAT-BioLINCC)
Data Access NOTEPlease refer to the "Authorized Access" section below for information about how access to the data from this accession. Access differs from many other dbGaP accessions. BiospecimensAccess to Biospecimens is through the NHLBI Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC). Biospecimens from TOPCAT include Buffy Coat, DNA, Plasma, Serum, Urine, and Whole Blood. Please note that use of biospecimens in genetic research is subject to a tiered consent. Objectives The TOPCAT trial evaluated the effectiveness of aldosterone antagonist therapy in reducing cardiovascular mortality, aborted cardiac arrest, and heart failure hospitalization in patients who have heart failure with preserved systolic function. Background Many patients with heart failure have a normal, or near-normal, left ventricular ejection fraction (LVEF). Such patients share similar signs and symptoms as patients who have heart failure and a reduced LVEF, as well as an impaired quality of life and a poor prognosis. However, at the time of TOPCAT, the benefit of medical therapies for heart failure were limited to those with a reduced LVEF. Due to a lack of favorable evidence from clinical trials, clinical guidelines offered no specific recommendations for the management of heart failure in patients with preserved LVEF, except for attention to coexisting conditions. Among patients with heart failure and a reduced LVEF and those with myocardial infarction complicated by heart failure and left ventricular dysfunction, mineralocorticoid receptor antagonists have been shown to be effective in reducing overall mortality and hospitalizations for heart failure. In small mechanistic studies involving patients with heart failure and preserved left ventricular function, mineralocorticoid receptor antagonists improved measures of diastolic function. However, rigorous testing was needed regarding their effect on clinical outcomes in patients with preserved LVEF. Therefore, the TOPCAT trial was initiated to determine whether treatment with spironolactone, an aldosterone antagonist, would improve clinical outcomes in patients with symptomatic heart failure and a relatively preserved LVEF. Participants A total of 3445 participants were enrolled, with 1722 assigned to the spironolactone group and 1723 assigned to the placebo group. Among these, 2464 participants were enrolled via the hospitalization stratum and 981 were enrolled via the BNP stratum. Design TOPCAT was a phase 3, multicenter, international, randomized, double-blind, and placebo controlled trial. Eligible participants were randomly assigned to receive either spironolactone or placebo in a 1:1 ratio. Randomization was stratified according to whether the patient met the criterion for previous hospitalization or BNP elevation. The baseline visit included assessment of socio-demographics, physical characteristics, medical history, lifestyle factors, laboratory measures, electrocardiography variables and health-related quality of life and functional status. Study drugs were initially administered at a dose of 15 mg once daily, which was increased as tolerated to a maximum of 45 mg daily during the first four months after randomization. Subsequent dose adjustments were made as required and subjects continued to receive other treatments for heart failure and co-existing illnesses. Measurement of potassium and creatinine levels was required within 1 week after a change in the study-drug dose and at each scheduled study visit. Follow-up visits to monitor symptoms, medications, and events and to dispense study drug were scheduled every four months during the subject's first year on the study, and every six months thereafter. The mean follow-up interval was 3.3 years in each study group. Repository blood and urine samples were collected at the baseline and 1 year visits from consenting subjects. The primary endpoint was a composite of cardiovascular mortality, aborted cardiac arrest or hospitalization for the management of heart failure. Secondary endpoints included all-cause mortality, hospitalization for heart failure management, new onset of diabetes mellitus or atrial fibrillation, and quality of life. A subset of subjects also participated in the Echocardiography or Echocardiography and Vascular Stiffness ancillary studies. Echocardiography, and additionally tonometry in the Echocardiography and Vascular Stiffness study, were performed at baseline and at either 12 or 18 months following randomization. If the subject was already enrolled in the TOPCAT trial at the time the ancillary study was initiated, but had not yet reached the 18 month visit, baseline was determined via a retrospective analysis performed on any echocardiographic images completed within 60 days prior to TOPCAT enrollment and no tonometry was performed if applicable. Conclusions In patients with heart failure and a preserved ejection fraction, treatment with spironolactone did not significantly reduce the incidence of the primary composite outcome of death from cardiovascular causes, aborted cardiac arrest, or hospitalization for the management of heart failure. However, the drug reduced the secondary endpoint of heart failure hospitalization incidence.
Study
phs003665
Genotyping microarray data of molecular tumorboard patients in the context of the HRD-manuscript published in IJC
Intensity files (.idat) derived from the Infinium CytoSNP-850K BeadChip microarray (Illumina) were used to analzye a possible Homologous-Recombination-Repair Deficiency (Molecular Rationale for a PARP-Inhibitor therapy) of n= 39 tumor patients (each n= 39 .idat files of the green and red chanel respectively) presented in the Molecular Tumorboard at the University Hospital Erlangen. Using these files a Genomic Instability Score was bioinformatically deducted (R-based scripts, e.g. ASCAT, scarHRD) based upon the presence of three parameteres "Loss of Heterozygosity, LOH", "Large Scale Transitions, LST" and "Telomeric Allelic Imbalances, TAIs". The associated manuscript is published in International Journal of Cancer, IJC.
Dataset
EGAD00010002736
Centers for Common Disease Genomics (CCDG) - Whole Genome Sequencing in Type 1 Diabetes (T1DGC)
The Type 1 Diabetes Genetics Consortium (T1DGC) was established to collect resources (biological samples and data) and conduct research to better understand the genetic basis of type 1 diabetes (T1D). Collection was initiated by ascertaining affected sib-pair families (both parents, two affected siblings and, when available, an unaffected sibling), collected from five geographic regions through four recruitment networks (Asia-Pacific, Europe, North America, United Kingdom). In addition, the T1DGC collected trio families (both parents and affected child) and cases and controls from low-prevalence populations (African-American, with four grandparents self-reporting as African ancestry; Mexican-American, with four grandparents self-reporting as ancestry from Mexico). The T1DGC also served as a repository for contributed collections from other studies, all meeting the broad data-sharing policy of the T1DGC, for inclusion in the genetic studies. These collections include T1D case samples ascertained from the UK Genetic Resource Investigating Diabetes (UK GRID) cohort, SEARCH for Diabetes in Youth (SEARCH), The Genetics of Kidneys in Diabetes (GoKinD), and control samples obtained from the British 1958 Birth Cohort, the UK National Blood Services collection, CLEAR (Consortium for the Longitudinal Evaluation of African-Americans with Early Rheumatoid Arthritis), the New York Cancer Project (NYCP), and other cohorts. For the NHGRI-funded Centers for Common Disease Genomics (CCDG) project, participants with T1D and ancestry-matched controls were identified through the T1DGC, either through direct ascertainment or by contribution from other sources to the T1DGC. As the CCDG has focused initially on non-Caucasian populations for whole genome sequencing, T1DGC participants of African, Mexican and Asian ancestry (targeting ~1200 cases and ~1200 controls in each ancestral group) and a small group of participants of Northern European ancestry (~100 cases, ~100 controls) were to be contributed to the study. Whole genome sequencing of T1DGC samples would be conducted at Washington University McDonnell Genome Institute and based upon matching case-control status within an ancestry group and prioritization by the CCDG.
Study
phs001222
GWAS for IgA Nephropathy
IgA nephropathy (IgAN) is a common form of immune-mediated glomerulonephritis characterized by glomerular deposition of IgA-containing immune complexes and manifesting with hematuria, proteinuria, and often kidney failure. This dataset is from a genome-wide association study (GWAS) designed to identify novel genetic risk loci for IgAN. In this dataset, seven cohorts were genotyped by Illumina Multi-Ethnic Global Array, consisting of 3,337 IgAN cases and 2,177 healthy controls of European and East Asian ancestries. Three of the provided cohorts were genotyped by Illumina HumanImmuno Chip, which are composed of 2007 cases of IgAN of European ancestry. These cohorts were used in the GWAS meta-analysis, as described in the manuscript entitled "GWAS defines pathogenic signaling pathways and prioritizes drug targets for IgA nephropathy" (Nature Genetics 2022, in press).
Study
phs000431
Experimental and Clinical Studies of Presbycusis
This study aimed to investigate genes and variants associated with adult-onset progressive hearing loss, a common and complex disease with a strong genetic component.Participants enrolled in an ongoing longitudinal study of age-related hearing loss at the Medical University of South Carolina (MUSC), dating from 1987. Pure tone thresholds at 0.25, 0.5, 1.0, 2.0, 3.0, 4.0, 6.0 and 8.0 kHz were obtained for each ear of each person, along with questionnaire responses concerning noise exposure history. Based on the audiometry, participants were classified into 5 groups as "Older-Normal", "Metabolic", "Sensory", "Unclassified" and "Unselected" by their hearing loss.Exome sequencing was carried out using the Agilent SureSelect X2 Target Enrichment System (version 5) and the Agilent SureSelect Human All Exon V5 kit, which included 5' and 3' UTRs. DNA was sheared using the Covaris S220 focused ultrasonicator. Libraries were sequenced on the Illumina HiSeq 2500. Variant loads per gene were calculated and compared between groups. Individual variants affecting hearing thresholds were also identified. Analyses were followed up in a second cohort. Several genes and variants were identified as novel candidates associated with both better and worse hearing. Data available through dbGaP include whole exome sequencing (WES) files and audiometric phenotyping.
Study
phs003327
Reasons for Geographic and Racial Differences in Stroke Cardiorenal GWAS
REGARDS is a national, population-based, longitudinal study of incident stroke and associated risk factors including over 30,000 Black and White adults aged 45 years or older from all 48 contiguous U.S. states and the District of Columbia. The study was designed to investigate reasons underlying the higher rate of stroke mortality among Black participants compared to White participants and among residents of the Southeastern U.S. compared to other U.S. regions. By design, Black adults and residents of the deep south were oversampled. Between 2003 and 2007 (baseline visit) participants completed a computer-assisted telephone interview to collect demographic information and medication adherence, and an in-home visit for blood pressure measurements and collection of blood and urine samples. Following the baseline visit, participants have been contacted by phone at six-month intervals to obtain information on incident stroke or secondary outcomes. Additionally, samples and data were collected on about 50% of the original cohort during a second study visit an average of 10 years after the baseline visit. Genotyping was performed as part of an ancillary study on 10,788 (84% Black) participants using Illumina MEGA arrays.
Study
phs002719
Study on the Genetics of Alcoholism (COGA): African American Family GWAS
COGA is a family study of alcoholism, in which the subjects have been drawn from the Collaborative Study on the Genetics of Alcoholism (COGA), a large, ongoing family-based study that includes subjects from seven sites around the US. COGA has gathered detailed, standardized data on study participants, including diagnostic and neurophysiological assessments. This project has already proved successful in identifying several genes that influence the risk for alcoholism and neurophysiological endophenotypes, which have been independently replicated. COGA data were included as part of two Genetic Analysis Workshops, and the phenotypes are familiar to the genetics community. Alcoholic probands were recruited from treatment facilities, assessed by personal interview, and after securing permission, other family members were also assessed. A set of comparison families was drawn from the same communities as the families recruited through an alcoholic proband. Assessment involved a detailed personal interview developed for this project, the Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA), which gathers detailed information on alcoholism related symptoms along with other drugs and psychiatric symptoms. Many participants also came to the laboratories for electroencephalographic studies. Neurophysiological features that have been shown to be useful endophenotypes for which we have linkage and in some cases association results are included on a subset of the case-control sample: the beta power of the resting electroencephalogram (EEG), the P3(00) amplitude of the visual event-related potential (ERP), and the theta and delta event-related oscillations (EROs) underlying the P3. As part of COGA, a set of informative African American families were selected to have Genome-Wide Association data obtained within families. Genotyping was performed using the Illumina Omni2.5_080814_1 chip to genotype 3,438 subjects selected from densely affected families. Genotyping was performed at CIDR. This sample complements a set of densely affected European American families previously made available under dbGaP study accession phs000763. In addition, exome sequencing data on a subset of individuals with GWAS were added in version 2.For version 3, a subset had 30X Whole Genome Sequencing (WGS) as part of the NIDA Sequencing Initiative. The subset contained two distinct sets: Sibling pairs where one sibling had at least two dependence diagnoses in the set (alcohol, cannabis, cocaine, and opioid), and the other had none, and non-related Case-Control pairs matched for age and ethnicity where the cases had alcohol and at least 2 other dependence diagnoses and controls had none. After sequencing, some sibling pairs are re-classified as half siblings, Three VCF files (small variants, structural variants, and copy number variations) are provided. Additional substance abuse variables are made available in version 3. We note that the full sample data are deposited in four dbGaP submissions and the sequenced samples are split across all four: CIDR: Collaborative Study on the Genetics of Alcoholism Case Control Study (phs000125). GWAS data on cases (primarily probands) and controls drawn from the families. Families with highest density of alcohol dependence and/or extreme event-related oscillation data (phs000763). GWAS data on 119 extended families of European descent are available here, along with extensive documentation. Study on the Genetics of Alcoholism (COGA): African American Family GWAS (phs000976). GWAS data on all available COGA families of African descent are available. COGA: Smokescreen GWAS (phs001208). GWAS data on all remaining COGA DNA samples, primarily of other racial background, were genotyped on the Smoke Screen array. A listing of all sequenced pairs is provided in the documentation to facilitate the merging of these samples.
Study
phs000976
Performance comparison of three DNA extraction kits on human whole-exome data from formalin-fixed paraffin-embedded normal and tumor samples
We generated 42 human whole-exome sequencing data sets from fresh-frozen (FF) and FFPE samples. These samples include normal and tumor tissues from two different organs (liver and colon), that we extracted with three different FFPE extraction kits (QIAamp DNA FFPE Tissue kit and GeneRead DNA FFPE kit from Qiagen, Maxwell\textsuperscript{TM} RSC DNA FFPE Kit from Promega). Variant calling analysis shows a very high rate of concordance between matched FF / FFPE pairs and equivalent performance for the three kits we analyzed. We find a significant variation in the difference of total number of variants called between FF and FFPE samples for the three different FFPE DNA extraction kits. Coverage analysis shows that FFPE samples have less good indicators than FF samples, yet the coverage quality remains above accepted thresholds. We detect limited but significant variations in coverage indicator values between the three FFPE extraction kits. Globally, the GeneRead and QIAamp kits have better variant calling and coverage indicators than the Maxwell kit on the samples used in this study, although this kit performs better on some indicators and has advantages in terms of practical usage. Taken together, our results confirm the potential of FFPE samples analysis for clinical genomic studies, but also indicate that the choice of a FFPE DNA extraction kit should be done with careful testing and analysis beforehand in order to maximize the accuracy of the results.
Study
EGAS00001002631
The Atherosclerosis Risk in Communities (ARIC) Study
The Atherosclerosis Risk in Communities (ARIC) Study, sponsored by the National Heart, Lung and Blood Institute (NHLBI), is a prospective epidemiologic study conducted in four U.S. communities. The four communities are Forsyth County, NC; Jackson, MS; the northwest suburbs of Minneapolis, MN; and Washington County, MD. ARIC is designed to investigate the etiology and natural history of atherosclerosis, the etiology of clinical atherosclerotic diseases, and variation in cardiovascular risk factors, medical care and disease by race, gender, location, and date. ARIC includes two parts: the Cohort Component and the Community Surveillance Component. The Cohort Component began in 1987, and each ARIC field center randomly selected and recruited a cohort sample of approximately 4,000 individuals aged 45-64 from a defined population in their community. A total of 15,792 participants received an extensive examination, including medical, social, and demographic data. These participants were reexamined every three years with the first screen (baseline) occurring in 1987-89, the second in 1990-92, the third in 1993-95, and the fourth and last exam was in 1996-98. Follow-up occurs yearly by telephone to maintain contact with participants and to assess health status of the cohort. In the Community Surveillance Component, currently ongoing, these four communities are investigated to determine the community-wide occurrence of hospitalized myocardial infarction and coronary heart disease deaths in men and women aged 35-84 years. Hospitalized stroke is investigated in cohort participants only. Starting in 2006, the study conducts community surveillance of inpatient (ages 55 years and older) and outpatient heart failure (ages 65 years and older) for heart failure events beginning in 2005. ARIC is currently funded through January 31, 2012. This study is part of the Gene Environment Association Studies initiative (GENEVA, http://www.genevastudy.org) funded by the trans-NIH Genes, Environment, and Health Initiative (GEI). The overarching goal is to identify novel genetic factors that contribute to atherosclerosis and cardiovascular disease through large-scale genome-wide association studies of well-characterized cohorts of adults in four defined populations. Genotyping was performed at the Broad Institute of MIT and Harvard, a GENEVA genotyping center. Data cleaning and harmonization were done at the GEI-funded GENEVA Coordinating Center at the University of Washington.
Study
phs000090
Detection of cancers three years prior to diagnosis using plasma cell-free DNA
A major goal of early cancer detection is to identify subclinical disease when the tumor burden is low, so that treatments are more effective. But how early can cancers be detected prior to clinical signs or symptoms? This question can be answered only through the evaluation of participants whose clinical course has not been altered by the study itself. We here describe such an evaluation, performed on prospectively collected plasma samples from the Atherosclerosis Risk in Communities (ARIC) study, including 26 participants diagnosed with cancer and 26 matched controls. At the index time point, eight of these 52 participants scored positively with a multicancer early detection (MCED) blood test. All eight of these participants were diagnosed with cancer within 4 months after blood collection. In six of these 8 participants, we were able to assess an earlier plasma sample collected 3.1 to 3.5 years prior to clinical diagnosis. In four of these six participants, the same mutations detected by the MCED test could be identified, but at 8.6 to 79-fold lower levels. These results demonstrate that it is possible to detect circulating tumor DNA (ctDNA) more than three years prior to clinical diagnosis, and provide benchmark sensitivities required for the success of ctDNA-based tests for this purpose.
Study
EGAS00001008068
Osteoporotic Fractures in Men (MrOS)
The Osteoporotic Fractures in Men Study (MrOS) is a multi-center prospective, longitudinal, observational study of risk factors for vertebral and all non-vertebral fractures in older men, and of the sequelae of fractures in men. The original specific aims of the study include: (1) to define the skeletal determinants of fracture risk in older men, (2) to define lifestyle and medical factors related to fracture risk, (3) to establish the contribution of fall frequency to fracture risk in older men, (4) to determine to what extent androgen and estrogen concentrations influence fracture risk, (5) to examine the effects of fractures on quality of life, (6) to identify sex differences in the predictors and outcomes of fracture, (7) to collect and store serum, urine and DNA for future analyses as directed by emerging evidence in the fields of aging and skeletal health, and (8) define the extent to which bone mass/fracture risk and prostate diseases are linked. The MrOS study population consists of 5,994 community dwelling, ambulatory men aged 65 years or older from six communities in the United States (Birmingham, AL; Minneapolis, MN; Palo Alto, CA; Monongahela Valley near Pittsburgh, PA; Portland, OR; and San Diego, CA). Inclusion criteria were designed to provide a study cohort that is representative of the broad population of older men. The inclusion criteria were: (1) ability to walk without the assistance of another, (2) absence of bilateral hip replacements, (3) ability to provide self-reported data, (4) residence near a clinical site for the duration of the study, (5) absence of a medical condition that (in the judgment of the investigator) would result in imminent death, and (6) ability to understand and sign an informed consent. To qualify as an enrollee, the participant had to provide written informed consent, complete the self-administered questionnaire (SAQ), attend the clinic visit, and complete at least the anthropometric, DEXA, and vertebral X-ray procedures. There were no other exclusion criteria. The baseline examination was completed over a 25-month period from March 2000 to April 2002. Participants completed questionnaires regarding medical history, medications, physical activity, diet, alcohol intake, and cigarette smoking. Objective measures of anthropometric, neuromuscular, vision, strength, and cognitive variables were obtained at a clinic visit. Skeletal assessments included DEXA, calcaneal ultrasound, and vertebral radiographs. Vertebral and proximal femoral QCT was performed on a subset (65%) of participants. Serum, urine, and whole blood specimens were collected. During the study period, participants complete a tri-annual questionnaire every four months that obtains information concerning the occurrence of incident falls and fractures. Follow-up is over 99% complete. All reported fractures are confirmed using physician review of radiology reports or study radiologist review of x-rays. All deaths are also centrally adjudicated using death certificates and recent hospitalization records. In additional to the baseline study visit and tri-annual questionnaires, all surviving active participants were invited to follow-up visits and questionnaires. A second comprehensive clinic visit was completed between March 2005 and May 2006 or about 4.5 years after baseline. Most baseline measures, including vertebral radiographs, were repeated at the second visit. A third comprehensive clinic visit was completed between March 2007 and March 2009 or about 7.0 years after baseline. Many baseline measures were repeated at the third visit. A measure of energy expenditure was also obtained utilizing an accelerometer based armband device. Surviving participants have also completed two extensive mailed questionnaires designed to updated information obtained at clinic visits. The first questionnaire was mailed between baseline and visit 2 between July 2002 and March 2004. The second questionnaire was mailed after the third questionnaire between March 2009 and February 2011. The MrOS Sleep Study, an ancillary study of the parent MrOS cohort, was conducted between December 2003 and March 2005 and recruited 3,135 MrOS participants for a comprehensive sleep assessment. Men were screened for nightly use of mechanical devices during sleep including pressure mask for sleep apnea (CPAP or BiPAP), mouthpiece for snoring or sleep apnea, or oxygen therapy and were generally excluded. The sleep visit included objective measures of sleep using actigraphy and polysomnography. Other measures at the sleep visit include neuropsychiatric measures, performance measures, health related questionnaires, routine exam measures previously completed at MrOS clinic visits and a urine and serum specimen collection. During the sleep study period, participants complete a tri-annual questionnaire every four months that contains information about incident cardiovascular events. All reported events are centrally adjudicated utilizing hospital records, study ECGs and other supporting documentation. The Hip Osteoarthritis study, another ancillary study of the parent MrOS cohort, was completed at the second clinic visit between March 2005 and May 2006. Participants were asked for consent for a hip x-ray and osteoarthritis measurements were obtained from these x-rays. DNA was extracted from baseline whole blood samples for use in genetic research. Consent for use of DNA was obtained through written consent. Raw GWAS data is housed at the study's coordinating center and not released as part of the dbGAP release. The raw data can be requested from the study's coordinating center. Only QC cleaned Plink files are available on dbGAP.
Study
phs000373
GoT2D: Genetics of Type 2 Diabetes, a study of the the genetic architecture of type 2 diabetes
The GoT2D study includes ~2800 samples, half T2D cases and half T2D controls, of Northern European ancestry sequenced over 3 three technologies: deep whole exome sequencing, low-pass (4x) whole genome sequencing, and OMNI 2.5M genotyping. Samples were ascertained to be phenotypically "extreme" (e.g. leaner, younger cases and older, more obese controls). Genotypes (SNVs, INDELs, and SVs) were called separately for each technology and then integrated via genotype refinement into a single phased reference panel; samples and variants were then excluded based on QC procedures described in Fuchsberger et al.
Please note that 2 of the samples in the GoT2D vcf do not have phenotype data.
Dataset
EGAD00001002247
Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA)
Asthma is a complex disease where the interplay between genetic factors and environmental exposures influences susceptibility and disease prognosis. Asthmatics of African descent tend to have more severe asthma and more severe clinical symptoms than individuals of European ancestry. Advances in genetic and genomic technologies have revolutionized gene discovery for several complex diseases, but going to the next step in gene discovery for asthma among populations of African descent requires considering unique characteristics of this ethnic group, including adequate sample sizes and population stratification due to (European and African) admixture. Thus far, coverage of common genetic markers both in public databases and commercially available SNP chips has been inadequate to detect and measure genetic associations among African admixed populations. The aim of this study was therefore to catalog genetic diversity in populations of African descent, especially those whose ancestry reflects the African Diaspora in the Americas.
Study
phs001123
Mitochondrial Abnormalities in Schizophrenia and Bipolar Disorder
Mitochondrial DNA structural variation is one proposed indicator of mitochondrial dysfunction. The large common deletion of mitochondrial DNA (mtDNA) increases with aging across multiple brain regions. We found other large deletions of mtDNA in brain at levels comparable to the common deletion (PMID: 30869147). Since mtDNA molecules replicate independently of nuclear division, the mtDNA copy number is a measure of mitochondrial health; levels are negatively correlated with age and all-cause morbidity. Complex I activity is another functional measure of mitochondria. We previously published a subset of this data from DLPFC (PMID: 29594135) for three indicators (common deletion, mtDNA copy number, Complex I activity). For this study, four indicators (common deletion, large deletions, mtDNA copy number, Complex I activity) were measured in four brain regions from subjects with schizophrenia (SZ) and bipolar disorder (BD) and compared to controls. The four brain regions were: the dorsolateral prefrontal cortex, superior temporal gyrus, primary visual cortex (V1), and nucleus accumbens.
Study
phs002395
International Consortium on the Genetics of Systemic Lupus Erythematosus (SLEGEN)
The genetic makeup of an individual strongly influences the risk of developing systemic lupus erythematosus (SLE). The identification of genes that predispose an individual to SLE will lead to earlier and better diagnosis, better treatments, and possibly prevention. To this end, the International Consortium on the Genetics of Systemic Lupus Erythematosus (SLEGEN) was formed in 2005 and is composed of lupus researchers who agreed to pool their knowledge and resources to search for genes that predispose to lupus. Eight laboratories contributed DNA samples for genotyping at the Broad Institute and association with SLE was performed by the Data Coordinating Center (Wake Forest University), as part of a four stage study design. Stages one and two of this design were graciously funded by the Alliance for Lupus Research (www.lupusresearch.org). In this stage of the study, approximately 767 SLE patients (cases) were compared to approximately 383 non-SLE patients (controls) for differences among the Illumina HumanHap300. The affected individuals are all females of European decent. 82% of the cases are the index case from multiplex pedigrees for SLE and the remaining 18% have self-reported first degree relatives with SLE. A detailed summary of the methods and results can be found in the manuscript in Nature Genetics February 2008 by SLEGEN "Genome-wide association scan in women with systemic lupus erythematosus identifies susceptibility variants ITGAM, PXK, KIAA1542 and other loci". (Please see also Study Accession: phs000202.v1.p1)
Study
phs000216
Charles R. Bronfman Institute for Personalized Medicine (IPM) BioBank Genome Wide Association Study of Cardiovascular, Renal and Metabolic Phenotypes
The Institute for Personalized Medicine (IPM) Biobank Project is a consented, EMR-linked medical care setting biorepository of the Mount Sinai Medical Center (MSMC) drawing from a population of over 70,000 inpatients and 800,000 outpatient visits annually. MSMC serves diverse local communities of upper Manhattan, including Central Harlem (86% African American), East Harlem (88% Hispanic Latino), and Upper East Side (88% Caucasian/white) with broad health disparities. IPM Biobank populations include 28% African American (AA), 38% Hispanic Latino (HL) predominantly of Caribbean origin, 23% Caucasian/White (CW). IPM Biobank disease burden is reflective of health disparities with broad public health impact: average body mass index of 28.9 and frequencies of hypertension (55%), hypercholesterolemia (32%), diabetes (30%), coronary artery disease (25%), chronic kidney disease (23%), among others. Biobank operations are fully integrated in clinical care processes, including direct recruitment from clinical sites, waiting areas and phlebotomy stations by dedicated Biobank recruiters independent of clinical care providers, prior to or following a clinician standard of care visit. Recruitment currently occurs at a broad spectrum of over 30 clinical care sites. Minorities are strikingly underrepresented in GWAS, including Coronary Artery Disease (CAD) and Chronic Kidney Disease; multigenic genetic risk scores for CAD have been recently validated in European ancestry populations, but not in AA or HL populations. Several important opportunities exist for extending additional GWAS to minority populations with a shared risk spectrum of CAD and CKD. For example, progressive CKD is a major and independent risk factor for CVD with an inverse relationship between estimated GFR (eGFR), and risk for mortality and cardiovascular events. This increased risk is only partially explained by the prevalence of cardiovascular risk factors among these patients. We conducted a GWAS of CAD and CKD related phenotypes in IPM Biobank with the primary objective to explore the genetics of overlapping CAD and CKD predominantly in minority populations characterized by increased risk.
Study
phs000388
Interethnic comparability in blood pressure GWAS
We carried out a genome-wide association and replication study for blood pressure in a two-stage approach (max N = 289,038) with a discovery stage sample of 130,777 East Asian individuals, identifying 19 new genetic loci. We found a significant genetic heterogeneity between East Asian and European-descent populations at several blood pressure loci, conforming to “a common ancestry-specific variant association model”. At 6 unique loci, distinct non-rare (or common) ancestry-specific variants co-localized within the same linkage disequilibrium block despite the significantly discordant direction of effects for the proxy shared variants between the ethnic groups. The genome-wide transethnic correlation of causal-variant effect sizes is 0.898 and 0.851 for systolic and diastolic blood pressure, respectively. Some of the ancestry-specific association signals were also influenced by a selective sweep. Our results provide new evidence for the role of common ancestry-specific variants and natural selection in the occurrence of ethnic differences in complex traits such as blood pressure.
Study
EGAS00001002991
The EGA at the International Congress of Human Genetics
Today is the start of the International Congress of Human Genetics (ICHG).
This event, hosted by the African Society of Human Genetics (AfSHG) and the
Southern African Society for Human Genetics (SASHG), will reunite
international experts to highlight how genomic technologies are being managed
to address challenges generated by the current status of Human Health and
Genomics.
The
European Genome-phenome Archive
(EGA) will be at the Congress participating in workshops, meetings and talks
where topics such as data access and discovery as well as the
Federated EGA
will be present.
Data access and discovery
In the context of the “Federated data analysis workshop“ that will be held the 26ᵗʰ, managed in the context of the
CINECA project, a
hands-on-session on
Beacon v2
Discovery will be led by Mauricio Moldes. Another session on authorisation and
data access will be presented by Mallory Freeberg and Coline Thomas, with and
introduction by Thomas Keane. During the workshop, an end-to-end federated
data analysis use case will be presented.
Data access and discovery
In the context of the “
Federated data analysis workshop“ that will be held the 26ᵗʰ, managed in the context of the
CINECA project, a
hands-on-session on
Beacon v2 Discovery will be led by
Mauricio Moldes. Another session on authorisation and data access will be
presented by Mallory Freeberg and Coline Thomas, with and introduction by
Thomas Keane. During the workshop, an end-to-end federated data analysis use
case will be presented.
During the last four years, CINECA has been identifying gaps in data sharing
between Africa, Canada and Europe in order to build a federated solution for
data access. Thomas Keane will present the project within the sessions
happening the 24ᵗʰ.
The Federated EGA at the ICHG
A
Federated EGA poster
will be shown during the Poster Session 1 that will take place the 23ʳᵈ. The
objective is to bring awareness of the
European Genome-phenome Archive
as a resource for permanent archive and secure data sharing. In this context,
the network generated by the Federated EGA is key for cases where research
data cannot be shared outside local jurisdictions.
The Federated EGA was officially launched on 2022 with research institutes
from five countries becoming the first nodes of this network. Nowadays, with
more than 20 nodes preparing to join, it aims to become the largest human
omics data sharing initiative towards understanding human health and disease.
The H3Africa Initiative and the African research represented in the EGA
1.1% of the EGA studies include samples from African populations. Countries
such as South Africa, Kenya, Uganda or Egypt, among others, are present in the
72.577 samples from African individuals archived at the EGA.
23 EGA datasets are managed by the
Human Heredity and Health in Africa Initiative
(H3Africa). This Initiative will celebrate its
Consortium Meeting
the 27ᵗʰ and 28ᵗʰ, back-to-back with the ICHG. H3ABioNet, the
Pan African Bioinformatics Network for the H3Africa consortium, will also take this context to celebrate its
10-Year Symposium
during the Congress. Members of the EGA team will attend both of them, as the
H3ABioNet works in conjunction with H3Africa projects, to submit data to the
European Genome-Phenome Archive.
Blog
the-ega-at-the-international-congress-of-human-genetics
Identifying autosomal recessive mutations causing neurological disorders
The objective of this study is to resequence of targeted intervals containing autosomal recessive variants causing neurological disorders in consanguineous pedigrees. Using homozygosity mapping, three intervals of very different sizes have previously been unambiguously mapped for three different neurological diseases: 2.4Mb, 8Mb and 14.3Mb in size, for Microlissencephaly, Severe Mental Retardation and Complicated hereditary spastic paraplegia respectively. This study is a pilot to assess how well custom targeted resequencing performs across a broad size range of intervals. The study design is to use a different custom capture probe set for each interval, pulldown from a single patient from each family, and sequence 1 lane using Illumina paired-reads for each sample. Candidate variants will be followed up in the families themselves, and in patients with similar phenotypes from outbred populations.
Study
EGAS00001000023
Ballett
TSO500 study: The project, called "BALLETT" (Belgian Approach of Local Laboratory Extensive Tumor Testing), has a double goal: (1) show the relevance of broad molecular profiling to improve oncological patients care, (2) demonstrate that broad molecular testing can be performed in a decentralized setting by local diagnostics laboratories in a fully standardized and uniform way while complying with the highest quality standards.
NCT05058937
The publication number is: Your manuscript ID is bcr-2023-256124. Title: CUP and BRAF V600E meeting the BEACON combination.
Study
EGAS50000000478
Amplified EPOR/JAK2 genes define a unique subtype of acute erythroid leukemia
Acute erythroid leukemia (AEL) is a unique subtype of acute myeloid leukemia characterized by prominent erythroid proliferation, whose molecular basis is poorly understood. To elucidate the underlying mechanism of erythroid proliferation, we performed comprehensive genomic study. The aim of this study is to unveil genotype phenotype correlations and the feasible molecular targets for therapy of AEL.
Study
EGAS00001005810
Longitudinal Study of Genetic Causes of Intrahepatic Cholestasis
This longitudinal observational study will investigate the natural history and progression of four genetic causes of intrahepatic cholestasis of childhood, including alpha-1 antitrypsin deficiency (α1-AT), Alagille syndrome (AGS), progressive familial intrahepatic cholestasis (PFIC), and bile acid synthesis defects (BAD). This study will be conducted as part of the Cholestatic Liver Disease Consortium (CLiC), an NIH-funded multi-centered Rare Disease Clinical Research Consortium. In this study, we will collect defined data elements in a uniform fashion at fixed intervals for five years over a relatively large number of patients with these rare disorders. In addition, a biobank of patient specimens and DNA samples will be established for use in ancillary studies to be performed in addition to this study. By comparing outcome measures between the four liver diseases (i.e., using each disorder as a disease-control for the other disorders), the full impact of each disorder can best be determined in comparison to the other liver diseases. Using the longitudinal database in this fashion, this study will provide an improved understanding of the effects of the cholestatic liver during childhood irrespective of the underlying etiology as well as to the pathophysiology, outcome, and complications of each of the disorders. This initial characterization will allow calculation of sample sizes for future therapeutic intervention clinical trials and provide the baseline to which interventions should be compared.
Study
phs001288
Genetic Neuroscience: How Human Genes and Alleles Shape Neuronal Phenotypes
The goal of this collaborative, interdisciplinary project is to develop powerful, generalizable approaches for discovering how risk variants for psychiatric disorders shape neurobiological processes at multiple levels of analysis, and to identify the processes whose dysregulation underlies disease.Induced pluripotent stem cells (iPSCs) were used towards the development of these new experimental and inferential systems bridging gaps between human genetics and experimental biology. The largest publicly available collection of iPSCs (2607 lines) has been generated from 2184 donors by the California Institute for Regenerative Medicine (CIRM). To expand our donor collection, an additional 44 iPSC donors from the McLean_Levy cohort were identified. We wish to share the available SNP data for 2167 CIRM lines and whole genome sequence data generated at the Broad Institute for 473 of the CIRM and 44 McLean_Levy iPSC donors. These data can be used to identify (for experiments) lines with specific genotypes of interest and lines from donors with high or low polygenic risk scores for phenotypes of interest. The data can also be used to identify acquired mutations in the iPSC lines. The CIRM iPSC lines are available through Fujifilm Cellular Dynamics iPSC Repository (https://www.fujifilmcdi.com/cirm-ipsc-products/). The McLean_Levy lines are available through the NIMH Repository & Genomics Resource (https://www.nimhgenetics.org/).Additional project data registered with the study includes data from an iPSC line derived from an SMA patient (n=1), as well as single-cell RNA sequence data and supplemental processed genomic datasets in support of project publications.Molecular DatasetsSingle-cell RNA-seq: 10X Genomics, Illumina NovaSeqSupplemental "cell village" pooled genomic sequence data: Illumina NextSeqWhole Genome Genotyping: Infinium Global Screening Array-24 Kit, Illumina HumanCore chipWhole Genome Sequencing: Illumina NovaSeq
Study
phs002032
The distinct DNA methylome of acute lymphoblastic leukemia
Here, using whole genome bisulfite sequencing (WGBS) data of over one hundred patients from multiple subtypes of acute lymphoblastic leukemia (ALL), controls samples and ALL cell lines, we show that in contrast to the prevailing paradigm, ALL samples exhibit CpG island hypermethylation but little to no global loss of methylation. The subtype-specific CpG island hypermethylation levels in T cell ALL span a broad range of methylation levels rather than previous binary classifications and are influenced by multiple factors, including TET2 expression.
Study
EGAS00001005203
Genes and Blood Clotting Study (GABC)
Objectives: Use genome-wide approaches to identify genetic variants that influence common thrombosis and hemostasis factors, as well as selected common human traits. Design/Methods: The GABC study was a prospective sibling cohort design. Siblings were recruited by targeted email to the undergraduate and graduate student email lists at the University of Michigan. Healthy persons between 14 and 35 years old who had healthy siblings within the same age restriction were able to participate. Study participants agreed to an online informed consent and subsequently completed a 52-question online survey describing their specific bleeding traits as well as many common human traits. Fifty milliliters of blood was collected into a citrate-dextrose solution (ACD) from each participant. An aliquot of whole blood was used for an automated complete blood count analysis and the remainder was processed into platelet poor plasma and buffy coat portions. Plasma and buffy coat aliquots were snap frozen and stored in liquid nitrogen for future studies. 1189 individuals representing 507 sibships were collected between 06/26/2006 and 01/30/2009. Phenotyping Survey Details: To characterize individual bruising and bleeding history, the online survey recorded answers to questions based on a modified von Willebrand Disease (VWD) screening questionnaire. To characterize a collection of participant's common human traits, the survey recorded answers to questions about height, weight, presence of skin tags, history of acne, eye color, hair color, hair line characteristics, skin sunburn sensitivity, skin tanning ability, natural skin color, freckling, cheek dimpling, earlobe shape, shoe size, foot arch characteristics, hand fifth digit morphology, history of dyslexia, history of migraine headaches, history of seasonal allergies, history of apthous ulcers, tendency to sneeze while walking into a bright sunny place, history of dental caries, need for corrective eye lenses, handedness and like or dislike of strongly flavored foods. Biochemical phenotyping: Assays for plasma Von Willebrand Factor (VWF) antigen were performed using ELISA and "Alphalisa" techniques. Automated complete blood count analysis was performed on a Bayer Advia 120 on all participants (including WBC differential, RBC indices, and platelet count.) For the dbGaP v2 update, new biochemical phenotypes have been submitted and include von Willebrand Factor, von Willebrand Factor propeptide, plasminogen, gamma prime fibrinogen, ADAMTS 13, antithrombin III, protein C, and protein S. All new phenotypes were obtained using "Alphalisa" techniques. Genotyping Details: SNP genotyping was performed using genomic DNA extracted from peripheral blood at the Broad Institute, (MIT/Harvard). Genotyping was performed on the Illumina Omni-1 quad chip at the Broad Institute. For the dbGaP v2 update, genotyping data from the Illumina Human Exome was deposited. This study is part of the Gene Environment Association Studies initiative (GENEVA, http://www.genevastudy.org) funded by the trans-NIH Genes, Environment, and Health Initiative (GEI). The overarching goal is to identify novel genetic factors that contribute to blood clotting through large-scale genome-wide association studies of siblings. Genotyping was performed at the Broad Institute of MIT and Harvard, a GENEVA genotyping center. Data cleaning and harmonization was performed by the primary investigators at the University of Michigan, Ann Arbor, and at the GEI-funded GENEVA Coordinating Center at the University of Washington. This study serves as a resource for investigators who are interested in the genetic determinants of specific plasma proteins in a healthy population. The sibling cohort design allows for linkage analysis in addition to association studies. Analysis of thrombosis and hemostasis related traits should help elucidate specific biochemical and genetic networks that maintain hemostasis. We hope to identify specific genetic determinants of VWF levels in order to better understand the factors that influence the development of VWD.
Study
phs000304
Clonally selected lines after CRISPR/Cas editing are not isogenic
The CRISPR-Cas9 system has enabled researchers to precisely modify/edit the sequence of a genome. A typical editing experiment consists of two steps: (i) editing cultured cells and (ii) selection of clones, which are presumed to be isogenic, with and without the intended edit. The application of the CRISPR-Cas9 system may result in off-target edits, while cloning would reveal culture-acquired mutations. We analyzed the extent of the former and of the latter by whole genome sequencing (WGS) involving separate genomic loci in three experiments conducted by three independent laboratories. In all experiments we hardly found any off-target edits, while we detected hundreds to thousands of single nucleotide mutations unique to each clone after relatively short culture of 10-20 passages. Notably, clones also differed in copy number alterations that were several kb to several mb in size, representing the largest source of genomic divergence among clones. This study in dbGaP includes data from experiments carried out by investigators at the Oklahoma Medical Research Foundation and Baylor College of Medicine (OMRF/BCM) involving an iPSC line derived from fibroblasts of a female patient (line c7) carrying a constitutional heterozygous variant (chr1:1,464,679 C>T; GRCh37) in exon 15 of the ATPase family, AAA domain containing 3A (ATAD3A) gene. The parental line was edited by introducing a double stranded break at the variant allele to correct the variant by homology directed repair. Two unedited control clones (clone7 and clone8) and one edited clone (SC20) were selected for WGS.
Study
phs003110
Aging and genome-wide patterns of DNA methylation in an African rainforest hunter-gathering population
Aging is associated with widespread changes in genome-wide patterns of DNA methylation. Thousands of CpG sites whose tissue-specific methylation levels are strongly correlated with chronological age have been previously identified. However, the majority of these studies have focused primarily on cosmopolitan populations living in the developed world; it is not known if age-related patterns of DNA methylation at these loci are similar across a broad range of human genetic and ecological diversity. We investigated genome-wide methylation patterns using saliva derived DNA from a traditionally hunting and gathering African populations: the Baka of the western Central African rainforest, together with the ≠Khomani San of the South African Kalahari Desert. We identify hundreds of CpG sites whose methylation levels are significantly associated with age, thousands that are significant in a meta-analysis, and replicate trends previously reported in populations of non-African descent. We confirm that an age associated site in the gene ELOVL2 shows a remarkably congruent relationship with aging in humans, despite extensive genetic and environmental variation across populations. We also demonstrate that genotype state at methylation quantitative trait loci (meQTLs) can affect methylation trends at some known age-associated CpG sites. Our study explores the relationship between CpG methylation and chronological age in populations of African hunter-gatherers, who rely on different diets across diverse ecologies. While many age-related CpG sites replicate across populations, we show that considering common genetic variation at meQTLs further improves our ability to detect previously identified age associations.
Study
EGAS00001002226
About
About
What's the EGA?
The European Genome-phenome Archive (EGA) is a global network for permanent archiving and sharing of personally identifiable genetic, phenotypic, and clinical data generated for the purposes of biomedical research projects or in the context of research-focused healthcare systems.
Jointly managed by the European Bioinformatics Institute (EMBL-EBI) in Cambridge (UK) and the Centre for Genomic Regulation (CRG) in Barcelona, we aim to advance biomedical research and promote personalised medicine worldwide by enabling discovery of and access to human genomic and health research data.
The EGA contains data collected from individuals whose consent agreements authorise data release for specific research use to bona fide researchers. We ensure strict security measures to control access to the data and maintain patient confidentiality. With expertise in data management and technical infrastructure, we promote FAIR data reuse and enable researchers to share their data securely. By leveraging public funding and our strategic partnerships, the EGA provides a free service for permanent data storage, data discovery, and secure data access. In addition, we foster a federated network to provide transnational access to human research data in compliance with legal frameworks.
For additional information about the EGA, please contact:
Helen Parkinson and
Mallory Freeberg
EMBL European Bioinformatics Institute
Arcadi Navarro,
Roderic Guigó, and
Jordi Rambla
Center for Genomic Regulation
History
The European Genome-phenome Archive was launched in 2008 at the European Bioinformatics Institute (EMBL-EBI), an outstation of the European Molecular Biology Laboratory (EMBL), to address an identified need for archiving and sharing the results of genome-wide association studies from the Wellcome Trust Case Control Consortium. With the signing of a memorandum of understanding in 2013 and a formal agreement in 2016, the EGA became a joint project of EMBL-EBI and the Centre for Genomic Regulation (CRG). The two institutes work together to support the EGA services, including supporting submissions, website, strategic leadership, and data infrastructure developments.
In 2022, the Federated EGA was officially launched with the signature of the first five countries: Finland, Germany, Norway, Spain, and Sweden. With more than 20 additional nodes worldwide preparing to join, the Federated EGA aims to become the largest human omics data sharing initiative towards understanding human health and disease.
EGA overview
If you're a researcher you may need to deposit, manage, or access genomic data in a secure and regulated way. The European Genome-phenome Archive (EGA) is a platform that facilitates these processes, ensuring that sensitive data is stored and shared in accordance with legal and ethical regulations.
Submission process
To start a submission, you need to become an EGA submitter. For that, youll need to sign a Data Processing Agreement (DPA) with us, that defines the terms and conditions under which your data will be processed and shared within the EGA system. The access to each study is controlled by its Data Access Committee (DAC). The DAC is responsible for managing data access requests and ensuring that the release of data is in accordance with the General Data Protection Regulation (GDPR).
Please, note that once your data is released, all public metadata related to your study and dataset(s) will be searchable on the EGA website. However, the files are only accessible under controlled access, which means that a DAC has to agree to a previous data access request.
Request process
It is possible to submit an access request to data stored at the EGA. The DAC assigned to the study will assess the request and, if approved, grant access to the data. Requesters must provide sufficient justification for your request and comply with the intended data usage in order to get access to it.
Each dataset is covered by a Data Access Agreement (DAA) that defines the terms and conditions of use for the specified dataset/s. The DAA is created and provided by the DAC, and must be signed by the individual requesting access to the given dataset/s.
Download process
Once the request for access is approved, the data and metadata can be downloaded. The EGA offers various download options to fit the needs: it is possible to preview files without downloading them, download specific files of interest, or even download terabytes of data.
Overall, the EGA provides a secure and regulated environment for depositing, managing, and accessing human data. Regardless of the role - submitter, DAC member, or requester - the EGA provides assistance for each while ensuring that sensitive data is managed in an ethical and responsible manner.
More information on how EGA handles data is available in the EGA dataflow.
Documentation
about/ega
NeuroCHARGE Consortium GWAS of White Matter Hyperintensities on MRI
Meta-analysis results of white matter hyperintensities (WMH) volume genome wide association study (GWAS) in older population-based studies from the CHARGE (Cohorts for Heart and Aging Research in Genomic Epidemiology) consortium and the UK biobank.
Study
phs002227
PARADIGM: Combined ctDNA and serum PSA for dynamic monitoring of metastatic prostate cancer starting first-line treatment
The prognosis of newly-diagnosed metastatic prostate cancer is highly variable. The PARADIGM prospective cohort study evaluated predictors of survival in blood collected at the start of each of the first six treatment cycles from 114 high-volume metastatic patients starting androgen deprivation therapy combined with docetaxel or an androgen receptor pathway inhibitor.
Study
EGAS50000001357
Minority Health Genomics and Translational Research Bio-Repository Database
The Minority Health Genomics and Translational Research Bio-repository Database (MHGRID) Network infrastructure facilitated the collection of biospecimens and related multidimensional data elements within a consortium of minority-serving clinics. This initiative expands the diversity of ancestral groups in national genomic medicine datasets and promises to accelerate the translation of personalized medicine into minority communities. MHGRID has an observational case-control design with severe hypertension as primary outcome.
Study
phs001815
High Density SNP Association Analysis of Melanoma: Case-Control and Outcomes Investigation
This research builds upon an extensive resource of melanoma cases and hospital based controls collected over several years at the U.T. M.D. Anderson Cancer Center. The goal of this research is to identify novel susceptibility and outcome-related genes for melanoma using a systematic genome-wide association-based approach. Our goal is to conduct high-density SNP association and outcome studies. This dbGaP study contains samples from 2000 European ancestry cases and 1000 European ancestry controls using the Illumina OMNI1-Quad SNP chip. As a part of an ongoing R01 project, we have epidemiological data together with candidate gene results for 1000 of the melanoma cases and the controls. With regard to the outcome aspect of our design, as part of our melanoma Specialized Program of Research Excellence (SPORE) grant, our MelCore database contains comprehensive, prospectively maintained clinical information from all melanoma patients included in the study cohort, including primary tumor histopathology and staging information, standard and investigational blood tumor markers, details of surgical and systemic therapies, and extensive follow-up information, including time to relapse or recurrence, pattern of recurrence and survival duration. Finally, we intend to collaborate with the GenoMEL collaboration so we can jointly evaluate each other's findings. The goal of our analysis will be to identify novel genetic factors predisposing the development of melanoma, as well as genetic factors controlling melanoma stage at presentation, recurrence and progression. This study is part of the Gene Environment Association Studies initiative (GENEVA, http://www.genevastudy.org) funded by the trans-NIH Genes, Environment, and Health Initiative (GEI). The overarching goal is to identify novel genetic factors that contribute to melanoma through large-scale genome-wide association studies of 2000 European ancestry cases and 1000 European ancestry controls. Genotyping was performed at the Johns Hopkins University Center for Inherited Disease Research (CIDR). Data cleaning and harmonization were done at the GEI-funded GENEVA Coordinating Center at the University of Washington.
Study
phs000187
Genome-wide association study in multiple human prion diseases suggests genetic risk factors additional to PRNP
We conducted GWAS of sporadic CJD, variant CJD, iatrogenic CJD, inherited prion disease, kuru and resistance to kuru despite attendance at mortuary feasts. After quality control we analysed 2000 samples and 6015 control individuals (provided by the Wellcome Trust Case Control Consortium and KORA-gen), for 491032-511862 SNPs in the European study. Association studies were done in each geographical and aetiological group followed by several combined analyses.
Study
EGAS00000000097
Samples linked to the study "Performance comparison of three DNA extraction kits on human whole-exome data from formalin-fixed paraffin-embedded normal and tumor samples"
We generated 42 human whole-exome sequencing data sets from fresh-frozen (FF) and FFPE samples. These samples include normal and tumor tissues from two different organs (liver and colon), that we extracted with three different FFPE extraction kits (QIAamp DNA FFPE Tissue kit and GeneRead DNA FFPE kit from Qiagen, Maxwell\textsuperscript{TM} RSC DNA FFPE Kit from Promega). Variant calling analysis shows a very high rate of concordance between matched FF / FFPE pairs and equivalent performance for the three kits we analyzed. We find a significant variation in the difference of total number of variants called between FF and FFPE samples for the three different FFPE DNA extraction kits. Coverage analysis shows that FFPE samples have less good indicators than FF samples, yet the coverage quality remains above accepted thresholds. We detect limited but significant variations in coverage indicator values between the three FFPE extraction kits. Globally, the GeneRead and QIAamp kits have better variant calling and coverage indicators than the Maxwell kit on the samples used in this study, although this kit performs better on some indicators and has advantages in terms of practical usage. Taken together, our results confirm the potential of FFPE samples analysis for clinical genomic studies, but also indicate that the choice of a FFPE DNA extraction kit should be done with careful testing and analysis beforehand in order to maximize the accuracy of the results.
Dataset
EGAD00001004066
NIDDK IBD Genetics Consortium Crohn's Disease Genome-Wide Association Study
This dataset contains data from a genome-wide association study performed with 968 Inflammatory Bowel Disease (IBD) affected cases and 995 unrelated controls using the Illumina HumanHap300 Genotyping BeadChip. Cases were selected to have Crohn's disease with ileal involvement, and controls were matched to cases based on sex and year of birth. Subjects were drawn from two cohorts: (1) persons with non-Jewish, European ancestry (561 cases and 563 controls), and (2) persons with Jewish ancestry (407 cases and 432 controls). Genotyping was performed at the Feinstein Institute for Medical Research. Seven-hundred fifty-four of the samples (468 cases and 286 controls) were taken from the NIDDK IBD Genetics Consortium cell line repository. These samples are identified in the IBD_Sample file. The subject IDs for these individuals may be used to request corresponding samples for follow-up research through the repository. In addition, complete phenotype data for these individuals are available, together with the Consortium's phenotyping manual and the forms used to collect the data. The remaining 1,209 samples were obtained from pre-existing collections ascertained through Cedars-Sinai Medical Center, Johns Hopkins University, University of Chicago, University of Montreal, University of Pittsburgh, University of Toronto, and the New York Health project (controls only). For these samples, only sex, cohort (Jewish vs. non-Jewish), and age at diagnosis (cases only) are available. Two-hundred three individuals from among the pre-existing samples did not provide consent to release their genotype data (designated as consent group 2 in the file IBD_Subject). Thus, individual genotype data are only provided for 1,760 samples. To compensate for this, we have provided summary results for each SNP. These are based on a stratified analysis testing case/control association. Fifty-one samples had a call rate less than 93% and were therefore excluded from this analysis, leaving an overall sample size of 1,963 - 51 = 1,912. X Chromosome Heterozygosity Nine samples have X chromosome heterozygosity that is neither consistent nor inconsistent with their phenotypic sex. One of these samples was found to have Turner Syndrome. The remaining 8 samples have heterozygosity ranging from 35-76%.
Study
phs000130
DNA Methylation Studies in CREW Cohorts (URECA and COAST)
Children's Respiratory Environment Workgroup (CREW) is an NIH-funded project consisting of 12 individual U.S. cohorts and three scientific centers working together to identify phenotypes and causes of childhood asthma. CREW includes data from a large number of children and their families, with broad diversity in terms of ethnicity, family characteristics, neighborhoods and geographic locations. In this study, we identified differentially methylated regions from whole-genome bisulfite sequencing (WGBS) in nasal epithelial cells from 39 children of European or African ancestries, and with or without allergic asthma. Study participants were from two of the CREW cohorts: the Urban Environment and Childhood Asthma (URECA) cohort and the Childhood Origins of Asthma (COAST) cohort.The URECA children are also participants of a larger genetic study, in which DNA samples are derived from blood. That data is presented in the "Whole Genome Sequencing in the Inner City Asthma Consortium (ICAC) Cohorts - phs002921" study.
Study
phs003321
Genotyping NIGMS CEPH Samples from the United States, Venezuela, and France
The Human Genetic Cell Repository (HGCR) is sponsored by the National Institute of General Medical Sciences (NIGMS) with the mission of supplying scientists with the materials for accelerating disease gene discovery and functional studies. The resources available include highly-characterized, contaminant-free fibroblast cell lines, transformed lymphoblastoid cell lines (LCLs), and DNA samples derived from these cultures. While the Repository has a major emphasis on inherited diseases, it also contains large collections of cell lines gathered from populations around the world that are intended for use in facilitating the understanding of human variation. Prominent among these resources is the CEPH Reference Family collection contributed by the Centre d'Etude du Polymorphisme Humain (CEPH), Foundation Jean Dausset, Paris, France. The CEPH collection includes families collected by R. White (Utah), J. Dausset (French), J. Gusella (Venezuelan), and J. Egeland (Amish). There are a total of 809 individuals accounting for 832 pedigree positions in the reference families. Family relationships for the 61 reference families were verified at Coriell and approved by CEPH. In an effort to enhance the value of this cell culture resource available from the Repository, the Coriell Genotyping and Microarray Center used the Affymetrix Genome-Wide Human SNP Array 6.0 platform to genotype 181 samples from the NIGMS HGCR CEPH collection. Included are thirteen families from the United States (Amish Pedigree 884 and Utah Pedigrees 1331, 1356, 1400, 1416, 1424, 1427, 1477, and 1582), France (Pedigrees 35 and 66), and Venezuela (Pedigrees 102 and 104). Twelve of the families consist of 3 generations and one family consists of 2 generations. Pedigree charts and sample descriptions are available on the Repository catalog website (see below).
Study
phs000268
Acute Respiratory Distress Network (ARDSNet) Study 04 Assessment of Low Tidal Volume and Elevated End-Expiratory Volume to Obviate Lung Injury (ALVEOLI-BioLINCC)
Data Access NOTE: Please refer to the "Authorized Access" section below for information about how access to the data from this accession differs from many other dbGaP accessions. BiospecimensAccess to Biospecimens is through the NHLBI Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC). Biospecimens from ARDSNet-ALVEOLI include Plasma and DNA. Please note that use of biospecimens in genetic research is subject to a tiered consent. Available Data Outcome data regarding organ failure free days are not available. Objectives The ARDS Network is a consortium of clinical centers and a coordinating center to design and test novel therapies for the treatment of Acute Lung Injury (ALI) and Acute Respiratory Distress Syndrome (ARDS). The ARDS Network 01/03 trials included an investigation of the efficacy and safety of Ketoconazole and Respiratory Management in the treatment of ALI and ARDS (KARMA). The Ketoconazole arm of the KARMA study was later stopped due to an inability to show efficacy. Participants continued to be randomized to the respiratory management arms of the study (ARMA), which compared two ventilator strategies: a tidal volume of 6 mL/kg versus 12 mL/kg. The LARMA phase of the study investigated the efficacy of Lisofylline and Respiratory Management. The objective of the ALVEOLI study was to compare clinical outcomes of participants with ALI and ARDS treated with a higher end-expiratory lung volume/lower FiO2 versus a lower end-expiratory lung volume/higher FiO2 ventilation strategy. The ALVEOLI study tested the hypothesis that mortality from ALI and ARDS would be reduced with a mechanical ventilation strategy designed to prevent lung injury from repeated collapse of bronchioles and alveoli at end-expiration. Background Most participants requiring mechanical ventilation for ALI and ARDS receive positive end-expiratory pressure (PEEP) of 5 to 12 cm of water. Higher PEEP levels may improve oxygenation and reduce ventilator-induced lung injury but may also cause circulatory depression and lung injury from overdistention. PEEP levels higher than traditional levels may reduce ventilator-induced lung injury by decreasing the proportion of nonaerated lung and higher PEEP levels may allow arterial-oxygenation goals to be met at a lower level of inspired oxygen (FiO2). Participants A total of 550 participants were randomized to receive mechanical ventilation with either lower or higher PEEP levels, which were set according to different tables of predetermined combinations of PEEP and fraction of inspired oxygen. Conclusions Participants with acute lung injury and ARDS who receive mechanical ventilation with a tidal-volume goal of 6 ml per kilogram of predicted body weight and an end-inspiratory plateau-pressure limit of 30 cm of water, clinical outcomes were statistically similar whether lower or higher PEEP levels are used. (Brower, et al., 1004; PMID: 15269312).
Study
phs003714
Binding of Epstein Barr Virus EBNA2 Unifies Multiple Sclerosis Genetic Mechanisms
This study is focused on multiple sclerosis (MS). MS is likely caused by a combination of genetic and environmental factors; however, the mechanisms contributing to these factors remain poorly understood. Epstein-Barr virus (EBV) in particular is a well-established environmental risk factor for MS. We have created a computational algorithm that systematically searches for common molecular mechanisms that might be impacted at multiple MS-associated loci. Using this algorithm, we have discovered that over 40% of MS-associated loci contain MS genetic variants that fall within regions of the human genome occupied by the EBV-encoded EBNA2 protein. The same MS-associated variants also impact gene expression levels of MS-associated genes in EBV-infected B cell lines. Our hypothesis is that allele-dependent binding of EBNA2 and its co-factors explains the allele-dependent risk at many MS genetic loci. Importantly, this hypothesis links the genetic associations of MS to the known molecular roles played by EBV and Notch signaling in disease processes.Subjects with and without MS were recruited into an Institutional Review Board (IRB)-approved protocol through the Cincinnati Autoimmune Registry and Repository. From each subject, DNA from peripheral blood was isolated for whole genome sequencing (WGS). These data will be used to identify genotype-dependent transcriptional regulation in cells assessed from patients and controls. This WGS data was deposited before the full integration of sequence data with other functional genomic data types.From each subject, 90 mL of blood is drawn for isolation of peripheral blood mononuclear cells from which primary B and T cells will be isolated. For each B cell isolation, an EBV-transformed B cell line will be derived.
Study
phs003240
The Multiethnic Cohort (MEC) Study
The Multiethnic Cohort (MEC) study of over 215,000 men and women in Hawaii and California is unique in that it is population-based and includes large representations of older adults (45-75 years at baseline) for five US racial/ethnic groups (Japanese Americans, African Americans, European Americans, Latinos and Native Hawaiians) at varying risks of chronic diseases. The MEC has established a large biorepository of mainly blood and urine collected from over 70,000 participants, linked to extensive, prospectively collected risk factors (e.g., diet, smoking, physical activity), biomarkers and clinical data for five racial/ethnic groups.
Study
phs002183
Sequencing Lymphoma
This repository contains sequencing data to describe the genomic profiles of several B-cell lymphomas including follicular lymphoma and Hodgkin lymphoma. We have used several different high throughput sequencing technologies to interrogate genomic DNA and RNA from tumor cells obtained from lymph node biopsies and normal cells obtained from skin biopsies and peripheral blood mononuclear cells.
A number of sequencing platforms for DNA, RNA, and protein were used to generate the data. These platforms include exome sequencing and a custom capture panel (NimbleGen) that targets 7.05 MB corresponding to the exons and splice sites of 1716 genes related to lymphoma biology (WUSM-LP). Other platforms include single-cell RNA-seq and Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-seq), which was used to describe transcriptional and proteomic changes in immune cell subsets within lymphoma patients pre- and post-therapy.
Study
phs001229
Vitamin-D-Kids Asthma
The Vitamin D Kids Asthma Study (VDKA) was a 48-week multicenter randomized, double-blinded, placebo-controlled trial of vitamin D3 for severe asthma exacerbations in children 6-16 years old, recruited from February 28, 2016 to September 20, 2019. An ancillary study for nasal sampling was conducted at the Pittsburgh site. Prior to randomization, participants completed questionnaires and underwent collection of nasal samples for gene expression studies. VDKA was approved by the IRBs of all participating institutions, and the ancillary study was approved by the IRB of the University of Pittsburgh. Written parental consent was obtained for participating children, from whom written assent was also obtained.
Study
phs004051
The Dynamic Immune Behavior of Primary and Metastatic Ovarian Carcinoma
Patients with high-grade serous ovarian carcinoma are usually diagnosed at an advanced stage and often develop resistance to conventional therapy. Epithelial ovarian tumors have been shown to be rich in CD8+ lymphocytes, which infiltrate both primary and omental sites. In recent years, several prominent studies have highlighted a positive association between the presence of these immune cells and a good prognosis; however, clinical trials of immune checkpoint inhibitors have so far shown unsatisfactory results in ovarian cancer. In this study, we used an innovative combination of single-cell sequencing and spatial transcriptomics to identify the molecular mechanisms that lead to immunosuppression in primary and metastatic HGSC. Primary tumors consistently showed a more active immune microenvironment than did omental tumors. In addition, while several immune cell subtypes were present in all samples, their relative abundance differed between primary and omental tumors. Finally, we found that primary tumors were mostly populated by dysfunctional CD4 and CD8 T cells in later stages of differentiation, while omental tumors were mostly populated by naïve immune cells; this, in turn, was correlated with expression changes in the interferon α and γ pathways in epithelial cells, showing that cross-communication between the epithelial and immune compartments is important for immune suppression in ovarian cancer. These findings can impact the timing and choice of immune-modulating drugs and should be considered when planning treatment for patients with high-grade serous ovarian carcinoma.
Study
EGAS50000000038
NHLBI TOPMed - NHGRI CCDG: The BioMe Biobank at Mount Sinai
The IPM BioMe Biobank, founded in September 2007, is an ongoing, broadly-consented electronic health record (EHR)-linked clinical care biobank that enrolls participants non-selectively from the Mount Sinai Medical Center patient population. BioMe currently comprises >42,000 participants from diverse ancestries, characterized by a broad spectrum of longitudinal biomedical traits. Participants are enrolled through an opt-in process and consent to be followed throughout their clinical care (past, present, and future) in real-time, allowing us to integrate their genomic information with their EHRs for discovery research and clinical care implementation. BioMe participants consent for recall, based on their genotype and/or phenotype, permitting in-depth follow-up and functional studies for selected participants at any time. Phenotypic and genomic data are stored in a secure database and made available to investigators, contingent on approval by the BioMe Governing Board. BioMe uses a "data-broker" system to protect confidentiality. Ancestral diversity - BioMe participants represent a broad racial, ethnic and socioeconomic diversity with a distinct and population-specific disease burden. Specifically, BioMe participants are of African (AA), Hispanic/Latino (HL), European (EA) and other/mixed ancestry. BioMe participants are predominantly of African (AA, 24%), Hispanic/Latino (HL, 35%), European (EA, 32%), and other ancestry (OA, 10%). Participants who self-identify as Hispanic/Latino further report to be of Puerto Rican (39%), Dominican (23%), Central/South American (17%), Mexican (5%) or other Hispanic (16%) ancestry. More than 40% of European ancestry participants are genetically determined to be of Ashkenazi Jewish ancestry. With this broad ancestral diversity, BioMe is uniquely positioned to examine the impact of demographic and evolutionary forces that have shaped common disease risk. Phenotypes available in BioMe - BioMe has a high-quality and validated set of fully implemented clinical phenotype data that has been culled by a multi-disciplinary team of experienced investigators, clinicians, information technologists, data-managers, and programmers who apply advanced medical informatics and data mining tools to extract and harmonize EHRs. BioMe, as a cohort, offers a great versatility for designing nested case-control sample-sets, particularly for studying longitudinal traits and co-morbidity in disease burden. Biomedical and clinical outcomes: The BioMe Biobank is linked to Mount Sinai's system-wide Epic EHR, which captures a full spectrum of biomedical phenotypes, including clinical outcomes, covariate and exposure data from past, present and future health care encounters. As such, the BioMe Biobank has a longitudinal design as participants consent to make all of their EHR data from past (dating back as far as 2003), present and future inpatient or outpatient encounters available for research, without restriction. The median number of outpatient encounters is 21 per participant, reflecting predominant enrollment of participants with common chronic conditions from primary care facilities. Environmental data: The clinical and EHR information is complemented by detailed demographic and lifestyle information, including ancestry, residence history, country of origin, personal and familial medical history, education, socio-economic status, physical activity, smoking, dietary habits, alcohol intake, and body weight history, which is collected in a systematic manner by interview-based questionnaire at time of enrollment. The IPM BioMe Biobank contributed ~10,600 DNA samples for whole genome sequencing to the TOPMed program. Samples were selected for the Coronary Artery Disease (CAD) and the Chronic Obstructive Pulmonary Disease (COPD) working groups. Using a Case-Definition-Algorithm (CDA), we identified ~4,100 individuals with CAD (~50% women) and ~3,000 individuals as controls (65% women). In addition, we identified ~800 individuals with COPD (62% women) and 1800 individuals as controls (72% women). Another 600 BioMe participants with Atrial Fibrillation, all of African ancestry, were included.
Study
phs001644
Genome-Wide Association Study of HIV-1 Host Genetics Among Injection Drug Users
The overarching goal of this project is to identify and characterize genetic determinants of HIV 1 susceptibility and resistance in samples of African American (AA) and European American (EA) injection drug users (IDUs) by conducting (1) a case/control genome-wide association (GWA) study of HIV 1 infection (positive/negative); (2) a case-only GWA study of viral load among HIV+ IDUs. The study uses existing samples and data from Urban Health Study (UHS) (PI: Alex Kral), which was the longest-running study of street-recruited IDUs in North America, from 1986-2005. UHS was a serial, cross-sectional sero-epidemiological study. Data were collected every 6 months in communities with a high prevalence of injection drug use in the San Francisco Bay Area. It used targeted sampling in neighborhoods at easily accessible community field sites, such as churches, single room occupancy hotels, and community centers. Eligibility criteria for initial entry to the study were (1) injection drug use in past 30 days; (2) ability to provide informed consent; and (3) age 18 or older. The UHS cohort includes over 9,000 African American and European American IDUs whose serum samples have been stored and data are available on HIV antibody status, HIV risk behaviors, drug abuse and demographics. The current study includes 985 HIV+ cases and 5,222 HIV- controls. Approximately six HIV- controls per case were frequency matched on: (1) self-reported ancestry; (2) sex; (3) age; (4) year of ascertainment; and (5) HIV risk class. This GWAS (DA026141) was funded by the National Institute on Drug Abuse (NIDA; PI: Eric O. Johnson). Funding support for genotyping, which was performed at the Johns Hopkins University Center for Inherited Disease Research (CIDR), was provided by NIDA and the NIH contract "High throughput genotyping for studying the genetic contributions to human disease"(HHSN268200782096C).
Study
phs000454
T-cell reconstitution after reduced dose ATLG induction in kidney transplant patients
We analyzed the T-cell receptor (TCR) repertoires from ten kidney transplant recipients. Five out of the ten kidney transplant recipients received ATLG while the other five recipients received basiliximab as induction therapy. TCR repertoires of CD4+ and CD8+ positive T-cells were assessed prior to transplantation and within the first month after transplantation as well as at three- and 12-months post-transplant. In addition, the pre-formed alloreactive TCR repertoire for each kidney transplant recipient was identified using mixed lymphocyte reaction and donor reactive T-cells were subjected to TCR beta sequencing. This dataset comprises a total of 106 samples. NGS TCR beta libraries of all samples were sequenced on an Illumina NextSeq 500 and raw sequencing data (in the form of fastq files) as well assembled clonotypes and their counts (in the form of clonotype tables) are provided.
Dataset
EGAD00001008478
human CMV-specific CD8+ T cells
In this study we established a comprehensive workflow to collect multi-omics single-cell data
using a commercially available micro-well based platform. This included whole transcriptome,
cell surface markers (targeted sequencing-based cell surface proteomics), T cell specificities,
adaptive immune receptor repertoire (AIRR) profiles and sample multiplexing. With this
technique we identified novel paired T cell receptor sequences for three prominent human
CMV epitopes. In addition, we review the ability of dCODE dextramers to detect antigenspecific
T cells at low frequencies by estimating sensitivities and specificities when used as
reagents for single-cell multi-omics.
Study
EGAS50000000633
Asian Immune Diversity Atlas (AIDA) sQTL
Current efforts within the Human Cell Atlas (HCA) are largely focused on defining reference human cell types using a relatively small number of samples of predominantly European origin. However, the next phase of the HCA is likely to address the topic of human diversity by comprehensively characterizing variation in cell states across age, sex, population groups, diseases and environments. To spur this next phase, and also expand the geographical scope of the HCA, we propose an HCA-Asia seed network to generate an Asian Immune Diversity Atlas (AIDA). AIDA will generate transcriptome variation datasets from 5 major Asian population groups (Chinese, Japanese, Korean, Indian, Malay), define an atlas of Asian cell types and states, and characterize their variation associated with ethnicity, age and sex. We will also formalize a study design based on common controls that can in principle reduce technical artifacts in a broad range of comparative single cell studies. This study design will be complemented with novel algorithms designed to normalize datasets of interest against the common controls, to increase the robustness of biological inferences. In addition to providing new tools and resources to the HCA, this seed network will serve as a template for future comparative studies in the single cell field.The AIDA dataset consists of scRNA-seq (10x 5′ v2 scRNA-seq) and genotype (Illumina Infinium Global Screening Array v.3) data. In the AIDA splicing quantitative trait loci (sQTL) project, we identified 11,577 independent cis-sQTLs, 607 trans-sQTLs, and 107 dynamic sQTLs. The post-imputation genotype for the Japanese and Korean cohort will be available through dbGaP. The Singaporean (ethnically Chinese, Indian and Malay) genotypes require a data access application to the HELIOS Data Access Committee (helios_science@ntu.edu.sg). European samples are used as control.
Study
phs003848
Genome Wide Association Study:GR@ACE Stage I
The Genome Research program at Fundacio ACE (GR@ACE, Barcelona, Spain), represents a new effort to study Alzheimer's Disease susceptibility. The GR@ACE study comprises 7,409 unrelated individuals. 4,120 AD cases were diagnosed at Fundació ACE Memory Clinic (Barcelona, Spain). 3,289 Controls were obtained from three collections (Fundació ACE, Barcelona; DNA National Biobank, Salamanca and HU Valme, Seville). Genotyping was carried out using the 815K Axiom Spain Biobank array (Affymetrix.) in the Spanish national center for genotyping (CEGEN, Santiago de Compostela, Spain). After genotyping, an extensive quality control was conducted, including re-sampling, gender match, stratification, relatedness, heterozygosity, differential missingness and HWD analyses.
Study
EGAS00001003424
The Genetic Basis of Progression in Multiple Sclerosis
This study examined the genetic determinants of multiple sclerosis (MS) severity in an international cohort of people with MS assembled by the International Multiple Sclerosis Genetics Consortium (IMSGC) and the MultipleMS Consortium. A total of 12,584 people with MS of European ancestry were included in the main genome-wide association study (GWAS). Participants were enriched for older individuals with longer disease duration (mean age 51.7 years and mean disease duration 18.2 years). All individuals were whole-genome genotyped on the Illumina Global Screen Array (GSAMD-24v2-0). Imputation was performed using the Haplotype Reference Consortium reference panel. The primary outcome was the age-related MS severity score. All individuals had at least 1 expanded disability status scale (EDSS) measure. When multiple EDSS scores were available, the last one was selected to calculate the age-related MS severity score.Of those 12,584 people with MS, this data release includes 4,623 participants recruited from non-European sites in the US, Canada and Australia. This release includes individual-level genotype and phenotype files necessary to replicate our GWAS of MS severity, including covariates.
Study
phs002929
Ischemic Stroke Genetics Study (ISGS)
The third leading cause of death in the United States, stroke is an acute neurological event leading to death of neural tissues.
Although the majority of strokes are ischemic strokes, meaning there is oxygen deprivation to the brain, almost 20% of strokes are
hemorrhagic, resulting from bleeding into the brain. Stroke is a complex disorder and likely multigenic in nature, resulting from
a combination of genetic and environmental factors. These well characterized risk factors that contribute to the incidence of stroke
include hypertension, cardiac disease, sickle cell disease, hyperhomocysteinemia, family history of stroke and smoking.
ISGS aim is to perform a prospective genetic association study of ischemic stroke focusing on the hemostatic system. ISGS
is a 5-center case-control study of first-ever ischemic stroke cases and concurrent controls individually matched for age,
sex and recruitment site.
This data includes that from subjects both banked in the
NINDS repository with biologicals publicly available, and those whose samples are not banked/not available.
Important links to apply for individual-level data
Data Use Certification Requirements (DUC)
Apply here for controlled access to individual level data
Participant Protection Policy FAQ
-->
This study utilized the NINDS Repository Cerebrovascular/Stroke Study,
and neurologically normal controls from the sample population which are banked in the National
Institute of Neurological Disorders and Stroke (NINDS Repository) collection for a first stage whole genome analysis.
Study
phs000102
Polycystic Ovary Syndrome (PCOS) Genetics
PCOS is a complex genetic disease reflecting the interplay of susceptibility genes and environmental factors. The cardinal reproductive feature of the syndrome, hyperandrogenemia, appears to play a direct role in the pathogenesis of the associated metabolic abnormalities. Male as well as female first-degree relatives have reproductive and metabolic phenotypes including increased prevalence rates of type 2 diabetes (T2D), metabolic syndrome (MBS) and other risk factors for cardiovascular disease (CVD). Northwestern University (NU) investigators lead a team that has extensive experience in phenotyping PCOS and in the genetic analysis of complex diseases including genome-wide association study (GWAS). Together with an expert group of collaborators from the Hershey Medical Center, and The University of Chicago, we have conducted a GWAS to identify PCOS susceptibility alleles using a large cohort of extensively and consistently phenotyped PCOS cases. Population controls for this study come from the NUgene project described below. NUgene project: In 2002, Northwestern committed to the development of a DNA repository to serve as a platform for the identification and validation of genotype-phenotype associations that will impact healthcare. The NUgene Project is a repository with longitudinal medical information from participating patients at affiliated hospitals and outpatient clinics from the Northwestern University Medical Center. Participants' DNA samples are coupled with data from a questionnaire (2 versions were used, 1 before and 1 after February 2006, both are included) and continuously updated data from our Electronic Medical Record (EMR) representing actual clinical care events. Northwestern has a state-of-the art, comprehensive inpatient and outpatient EMR system of over 2 million patients. NUgene has broad access to participant data for all outpatient visits as well as inpatient data via a consolidated data warehouse. NUgene participants consent to distribution and use of their coded DNA samples and data for a broad range of genetic research by third-party investigators.
Study
phs000368
A Genomics-Based Classification of Human Lung Tumors
We characterized genome alterations in 1255 clinically annotated lung tumors of all histological subgroups to identify genetically defined and clinically relevant subtypes. More than 55% of all cases had at least one oncogenic genome alteration potentially amenable to specific therapeutic intervention, including several personalized treatment approaches that are already in clinical evaluation. Marked differences in the pattern of genomic alterations existed between and within histological subtypes, thus challenging the original histomorphological diagnosis. Immunohistochemical studies confirmed many of these reassigned subtypes. The reassignment eliminated almost all cases of large cell carcinomas, some of which had therapeutically relevant alterations. Prospective testing of our genomics-based diagnostic algorithm in 5145 lung cancer patients enabled a genome-based diagnosis in 3863 (75%) patients, confirmed the feasibility of rational reassignments of large cell lung cancer, and led to improvement in overall survival in patients with EGFR-mutant or ALK-rearranged cancers. Thus, our findings provide support for broad implementation of genome-based diagnosis of lung cancer.
Study
EGAS00001000647
Development and Validation of a Disability Severity Index for Charcot-Marie-Tooth Disease (CMT)
In Charcot-Marie-Tooth Disease (CMT) or inherited neuropathy research studies, it is the researcher who has selected what they believe to be important markers of impairment in function in patients. For example, it has been inferred that the wearing of ankle-foot-orthosis (AFOs), the use of walking aids such as canes, or the use of wheelchairs, are appropriate markers for “mild”, “moderate” or “severe” disability, respectively. Whether patients agree with this classification is unknown. By understanding what patients classify as mild, moderate and severe disability in CMT, we will know what our treatments need to target, to have a meaningful impact on the patients' functional status. Primary objective: The purpose of this study is to compare patient and healthcare provider impressions of what constitutes mild, moderate and severe disability in CMT. Target population: 200 patients who have self-registered at the Inherited Neuropathy Consortium Contact Registry, a web-based contact registry developed and supported by the Data Management and Coordinating Center (DMCC) for the Rare Diseases Clinical Research Consortium (RDCRN), located at the University of South Florida, and 200 health care professionals attending conferences about CMT, such as the 4th International CMT Consortium to be held in Potomac, Maryland, June 29-July 1, 2011 and the MDA Clinic Directors meeting in Las Vegas, NV March 4-7, 2012. Methods: A brief, anonymous, 20-item survey, in which we measure what the physician and the patient think are important indicators of disability in CMT, will be distributed by paper to 200 health care professionals, and via an online link to 200 patients self-registered with the RDCRN contact registry. Out of the 200 patients, approximately 25 patients will be requested to take the survey twice in a 2 to 4 week period. Analysis: We will measure the agreement between items in the physician and patient groups, and use items with high agreement in a Disability Severity Index.
Study
phs001295
Identifying autosomal recessive mutations causing neurological disorders
The objective of this study is to resequence of targeted intervals containing autosomal recessive variants causing neurological disorders in consanguineous pedigrees. Using homozygosity mapping, three intervals of very different sizes have previously been unambiguously mapped for three different neurological diseases: 2.4Mb, 8Mb and 14.3Mb in size, for Microlissencephaly, Severe Mental Retardation and Complicated hereditary spastic paraplegia respectively. This study is a pilot to assess how well custom targeted resequencing performs across a broad size range of intervals. The study design is to use a different custom capture probe set for each interval, pulldown from a single patient from each family, and sequence 1 lane using Illumina paired-reads for each sample. Candidate variants will be followed up in the families themselves, and in patients with similar phenotypes from outbred populations
Dataset
EGAD00001000340
The Collaborative Study on the Genetics of Alcoholism (COGA)
COGA is a family study of alcoholism, in which the subjects have been drawn from the Collaborative Study on the Genetics of Alcoholism (COGA), a large, ongoing family-based study that includes subjects from seven sites around the US. COGA has gathered detailed, standardized data on study participants, including diagnostic and neurophysiological assessments. This project has already proved successful in identifying several genes that influence the risk for alcoholism and neurophysiological endophenotypes, which have been independently replicated. COGA data were included as part of two Genetic Analysis Workshops, and the phenotypes are familiar to the genetics community. Alcoholic probands were recruited from treatment facilities, assessed by personal interview, and after securing permission, other family members were also assessed. A set of comparison families was drawn from the same communities as the families recruited through an alcoholic proband. Assessment involved a detailed personal interview developed for this project, the Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA), which gathers detailed information on alcoholism related symptoms along with other drugs and psychiatric symptoms. Many participants also came to the laboratories for electroencephalographic studies. Neurophysiological features that have been shown to be useful endophenotypes for which we have linkage and in some cases association results, are included for a subset of the case-control sample: the beta power of the resting electroencephalogram (EEG), the P3(00) amplitude of the visual event-related potential (ERP), and the theta and delta event-related oscillations (EROs) underlying the P3. As part of COGA, a set of informative families was selected to have Genome-Wide Association data obtained within families. Genotyping was performed using the Illumina Human OmniExpress array 12.VI to genotype 2,282 subjects selected from 118 densely affected families. Genotyping was performed at the Genome Technology Access Center at Washington University School of Medicine in St. Louis. In addition, we also included genotypes for subjects (n=275 subjects) from these 118 families who were genotyped in a previous case-control GWAS using the Illumina 1M array. For quality control purposes, 51 of the 275 subjects were genotyped again on the Illumina Human OmniExpress array at the Washington University School of Medicine core facility.In addition, exome sequencing data on a subset of individuals with GWAS were added in version 2 (v2). For v2, a subset had 30X Whole Genome Sequencing (WGS) as part of the NIDA Sequencing Initiative. The subset contained two distinct sets: Sibling pairs where one sibling had at least two dependence diagnoses in the set (alcohol, cannabis, cocaine, and opioid), and the other had none, and non-related Case-Control pairs matched for age and ethnicity where the cases had alcohol and at least 2 other dependence diagnoses and controls had none. After sequencing, some sibling pairs are re-classified as half siblings. Three VCF files (small variants, structural variants, and copy number variations) are provided. Additional substance use variables are made available in v2. We note that the full sample data are deposited in four dbGaP submissions and the sequenced samples are split across all four: CIDR: Collaborative Study on the Genetics of Alcoholism Case Control Study [phs000125]. GWAS data on cases (primarily probands) and controls drawn from the families. Families with highest density of alcohol dependence and/or extreme event-related oscillation data [phs000763]. GWAS data on 119 extended families of European descent are available here, along with extensive documentation. Study on the Genetics of Alcoholism (COGA): African American Family GWAS [phs000976]. GWAS data on all available COGA families of African descent are available. COGA: Smokescreen GWAS [phs001208]. GWAS data on all remaining COGA DNA samples, primarily of other racial background, were genotyped on the Smoke Screen array. A listing of all sequenced pairs is provided in the documentation to facilitate the merging of these samples.
Study
phs000763
eMERGE Network's Multi-Center Pilot of Pharmacogenetic Sequencing in Clinical Practice
eMERGE-PGx is a multi-site test of the concept that sequence information can be coupled to electronic medical records (EMRs) for use in healthcare. The promise of personalized medicine - health care guided by each individual's biological characteristics - is being fostered by increasingly powerful and economical methods to acquire clinically relevant biomarkers from large numbers of people. One therapeutic area that seems especially ripe for an early test of the personalized medicine concept is pharmacogenomics (PGx) - the idea that individual variation in drug response includes a genomic component. Drug response variation is an accepted feature of virtually all drug treatments, and contemporary molecular biologic tools continue to identify key genes mediating drug metabolism, transport, and targets. Importantly, common variation in these genes is an increasingly well-recognized contributor, sometimes with large effects, to variation in drug responses. As a result, recommendations for genotype-guided therapy are increasing. These evidence-based recommendations, if implemented in health care practice, could reduce adverse drug events and improve time to therapeutic response. Through eMERGE-PGx, we are developing strategies for the optimal implementation of genetic sequence data into the clinical environment with the ultimate goal of improving patient care. Site and participants include: Children's Hospital of Pennsylvania (CHOP): The Center for Applied Genomics (CAG) at the Children's Hospital of Philadelphia (CHOP) is a high-throughput, highly automated genotyping and sequencing facility equipped with state-of-the-art genotyping and sequencing platforms. Children who are treated at the Children's Hospital Healthcare Network and their parents may be eligible to take part in a major initiative to collect more than 100,000 blood samples, covering a wide range of pediatric diseases. The PGx population selected for sequencing with the PGRNseq panel at CHOP is 1,650 children from CAG's biorepository with well-documented drug-related severe adverse events (SAEs) or EHR-based drug response profiles. SAEs were extracted from EPIC records and from CHOP's Adverse Event (AE) database, which documents every AE at CHOP. These AEs are classified by a medical review panel according to the causal relationship with the suspected drug into 'doubtful', 'possible', and 'probable'. Individuals with events classified as probable, severe and objective, were selected for sequencing. The drugs more frequently associated with adverse events are antibiotics, antineoplastics, immunosuppressants and psychotropic drugs. This cohort constitutes 50% of the target population. The remaining subjects were selected using EHR-based algorithms that we have developed and validated at CAG for identifying patients not responding to ADHD medication (primarily atomoxetine) and patients refractory to antiepileptic treatment from responders. Cincinnati Children's Hospital Medical Center/Boston's Children's Hospital (CCHMC/BCH): 811 CCHMC samples were obtained from children, adolescents or young adults exposed to medication or at risk for needing medication of study interest. 55% of participants were exposed to one or more opioids and their DNA source was a CCHMC study-specific biobank; while 27% of participants were at risk for needing an opioid for surgical pain management and were newly recruited. The remainder of the cohort was exposed to methylphenidate and their DNA samples were obtained from a CCHMC study-specific biobank. The focus of Boston Children's Hospital eMERGE PGx project is on individuals with epilepsy. Samples were taken from a current pharmacogenomics study already in place through which DMET analysis was run and used as confirmation for PGRN-Seq results. A total of 109 samples were sent for PGRN-Seq analysis at University of Washington. The remaining 141 epilepsy samples were from Children's Hospital of Philadelphia and underwent testing with PGRN-Seq at CHOP. Geisinger Health System: A research cohort of adult Geisinger Clinic patients was enrolled from community-based primary care clinics of the Geisinger Health System. Patients were eligible for enrollment if they were a primary care patient of a Geisinger Clinic physician and were scheduled for a non-emergent clinic visit. All data are from Geisinger patients who consent to participate in the MyCode project. MyCode participants agree to provide biological samples for broad research use, including genomic analysis, and for linking of sample data to information in the participant's Geisinger health record. The consent also permits sharing of de-identified data for research purposes. Group Health(GH)/University of Washington (UW): Potential GH participants for the PGx project were enrolled in the eMERGE Network through the Northwest Institute of Genetic Medicine (NWIGM) biorepository, and provided the appropriate consent to receive clinically relevant genetic results (N~6300). Participants were eligible if aged 50 - 65 years old at the time of their enrollment into the NWIGM repository, living, enrolled in GH's integrated group practice, and had completed an online Health Risk Appraisal. The selection algorithm was based on several data sources from the EHR at Group Health: 1. Demographics - participants with self-reported race as Asian or African ancestry were prioritized and selected to enrich for non-European ancestry; 2. Diagnosis and procedure codes - participants were selected if found to have a history of hypertension, atrial fibrillation (AF,) or congestive heart failure (CHF). Participants with a history of arrhythmia were added if the entire selection algorithm did not generate 900 individuals. We also enriched for participants with EHR evidence of actionable indications related to PGRNSeq genes. Participants were selected if found to have an ICD9 code for malignant hyperthermia, hypertension, atrial fibrillation, congestive heart failure or long QT syndrome (LQTS); 3. Laboratory values - if a participant had any laboratory event of creatine kinase (CK) > 1000, and were dispensed statins within 6 months of the event, then they were selected; and 4. Medications - participants were excluded if ever on carbamazepine or had a current regimen of warfarin. Essentia Institute of Rural Health, Marshfield Clinic, Pennsylvania State University (Marshfield): For this study, 750 subjects were selected and enrolled into PGx based on Vanderbilt's algorithm designed to enrich for patients who are most likely to receive one of three common drugs (Clopidogrel, Warfarin or Simvastatin) in the next 2-3 years. These patients were sent a letter of invitation and description of the PGx project. Follow-up phone calls were made, and interested subjects came in for a one time meeting to discuss the project and go through the informed consent with the research coordinator. If they were interested they signed the consent and HIPAA forms and gave blood. Subjects were chosen and enrolled into PGx independently of previous biobank participation. Mayo Clinic: The Right Drug, Right Dose, Right Time - Using Genomic Data to Individualize Treatment (The RIGHT Protocol) enrolled 1013 patients to test the hypothesis that prescribers could deliver genome-guided drug therapy at the point-of-care by using pharmacogenomic data preemptively integrated in the electronic medical record. Complete details regarding the study population have been previously described (Bielinski et al., 2014). Icahn School of Medicine at Mount Sinai School (Mt Sinai): Our study site is the Primary Care Associates (PCA) practice group of the Mount Sinai Faculty Practice Associates (FPA) of the Mount Sinai Medical Center in New York City. This practice has 12 physician providers. All patient encounters are documented and managed with EpicCare ambulatory electronic medical record. Active PCA Patients eligible for enrollment fulfilled the following criteria: a) age 50 or older receiving clinical care at Mount Sinai FPA PCA practice with at least one practice encounter within 18 months prior to commencement of enrollment; b) no history or current use of clopidogrel, warfarin, or simvastatin. Eligible patients were invited to participate through de novo recruitment by letter sent by their provider. Interested patients were screened for eligibility and enrolled to participate in the eMERGE PGX study on site by a dedicated research coordinator. In addition to de novo enrollment from clinical practice, patients of FPA PCA who had previously enrolled in Mount Sinai's BioMe Biobank program AND fulfilled eligibility criteria as stated under a) and b) were identified by chart review and samples sequenced at CIDR using PGRNseq platform (N=300). PGRNseq data from 291 samples passed stringent quality control and are included in the current data set. Furthermore, 56 of these patients carrying known and validated 'actionable' variants affecting prescribing of clopidogrel, warfarin, and/or simvastatin were enrolled in the eMERGE PGX study following invitation through recontacting by the Principal Investigator of the BioMe Program. Northwestern University: Participants for this study were recruited from the General Internal Medicine (GIM) clinic at Northwestern Medical Group (NMG). Patients were selected for invitation to participate if they had been seen a minimum of two times over the last four years, having a high likelihood to receive a prescription for warfarin, Plavix, or a statin, and are seeing a physician who has agreed to allow their patients to be contacted for the study. We utilized an algorithm developed at Vanderbilt and tailored to our population which uses our EHR to estimate the probability that individuals will receive a prescription for warfarin, Plavix, or a statin in the next three years. Participants were sent a letter explaining the study prior to their GIM appointment and offered participation at the time of their visit. Participants were consented on-site and blood drawn after consent was obtained. The GIM clinic consists of 39 primary care physicians who provide approximately 80,000 patient encounters per year. As with any large primary care clinic, a significant proportion of patients in GIM clinic suffer from a variety of chronic health conditions, such as diabetes, hypertension, and coronary artery disease. Over 50,000 individuals have been seen by GIM doctors in the past 5 years; 11,562 of these patients have evidence of a statin prescription in the EHR, 3,436 have evidence of a warfarin prescription, and 1,872 have evidence of a Plavix prescription. Vanderbilt University: The more than 1000 participants enrolled into Vanderbilt's eMERGE PGx study were newly recruited from the Cardiology and Internal Medicine Clinics and the Hillsboro Medical Group within Vanderbilt University Medical Center (VUMC). Patients were selected based on a predictive algorithm estimating the patient's likelihood of receiving Clopidogrel, Warfarin, and/or Simvastatin. The algorithm identifies primarily older middle-aged patients, and the mean age of the study group is 74. The cohort is approximately 45% female with 75% of subjects self-identified as EA and 24% as AA. Subjects were consented in person by study personnel following a routine clinic visit and an introduction to the study staff by their doctor. VUMC is a comprehensive health care facility dedicated to patient care, research, and the education of health care professionals. Translational research into the causes and treatment of disease as well as studying fundamental biological properties is the primary focus of discovery at Vanderbilt. Clinical research is conducted in Vanderbilt University Hospital, the Nashville Veterans Administration Hospital, Meharry General Hospital and in their associated outpatient clinics. These hospitals and clinics, all associated with the Vanderbilt system, each have full time Vanderbilt faculty and medical housestaff and provide clinical care and participate in research programs. The Vanderbilt Clinic is comprised of more than 95 adult outpatient specialty practices and received over 1.5 million ambulatory visits in 2012-13. The Vanderbilt Heart and Vascular Institute offers a comprehensive heart program offering diagnosis, medical treatment, minimally invasive therapies, surgical intervention and disease management, tailored to each individual's unique needs. All programs within the Vanderbilt Clinic have survival figures that surpass the national average.
Study
phs000906
BipEx_Landen_SWEBIC
BipEx: Bipolar Exome Sequencing Broad & Karolinska Institutet.
EGAS00001005860. "This research project was a collaboration between the Karolinska Institute and the Stanley Center at the Broad Institute. In this project we sequenced and analyzed the whole exomes of X SWEBIC Bipolar case samples from collaborators in Sweden. Genomic DNA from each samples was sequenced to a mean depth of 20x"
Dac
EGAC50000000142
BipEx_Pedersen_Karolinska
"BipEx: Bipolar Exome Sequencing Broad & Karolinska Institutet.
EGAS00001005856. ""This research project was a collaboration between the Karolinska Institute BioBank and the Stanley Center at the Broad Institute. In this project we sequenced and analyzed the whole exomes of 4,765 control samples from collaborators in Sweden. Genomic DNA from each samples was sequenced to a mean depth of 20x."" "
Dac
EGAC50000000141
Genetic and Phenotypic Analysis of Multiple Sclerosis in Hispanics
We sought to identify MS susceptibility variants specific to the Major Histocompatibility Complex and to assess the effect of ancestral risk modification within 2652 Latinx and Hispanic individuals, as well as 2435 Black and African American individuals ascertained in collaboration with the Alliance for Research in Hispanic MS (ARHMS). All DNA samples were obtained through whole blood extraction and were genotyped on the MS Chip, an Illumina Infinium custom genotyping array that contains targeted and dense coverage of the extended MHC, specifically designed for imputation. Classical HLA alleles, SNPs, and amino acid residues were imputed simultaneously from genotyped SNPs using HLA-TAPAS and a multi-ethnic reference panel of 2504 samples from the 1000 Genomes Project. We have identified several novel susceptibility alleles which are rare in European populations including HLA-B*53:01, and we have identified an independent role for HLA-DRB1*15:01 and HLA-DQB1*06:02 on MS risk. We found a decrease in Native American ancestry in MS cases vs controls across the MHC, peaking near the previously identified MICB locus; and we have identified several susceptibility variants for which MS risk is modified by global or local ancestry. We are making data on all genotyped and imputed SNPs, classical HLA alleles, and amino acid residues available through dbGaP for the Hispanic and African American samples utilized in these analyses.
Study
phs003105
Are children born after medical assisted reproduction at greater risk of having an increased de novo mutation rate?
Introduction: De novo mutations (DNMs) play a prominent role in sporadic disorders with reduced fitness such as infertility and intellectual disability. Advanced paternal age is known to increase disease risk in offspring by increasing the number of DNMs in their genome. Less is known about the effect of assisted reproduction techniques (ART) on the number of DNMs in offspring. With the on-going trend of delayed parenthood more children are now born both from older fathers and through ART.
Materials and Methods: We investigated 49 trios (mother, father and child) and 2 quartets (mother, father and 2 siblings) divided into children born after spontaneous conception (n=18); born after in vitro fertilisation (IVF) (n=17) and born after intracytoplasmic sperm injection combined with testicular sperm extraction (ICSI-TESE) (n=18). Groups further divided by paternal age, young (<35) or old (>45 years of age at conception). Whole-genome sequencing was performed twice to independently detect and validate all DNMs in children.
Results: A clear paternal age effect was observed, with 70 DNMs detected on average in children born to young fathers and 94 DNMs in those born to older fathers (p = 0.001). No significant differences were observed between different methods of conception (p = 1) with paternal age affecting all methods equally.
Conclusions: Paternal age, not method of conception, had a major effect on the observed number of DNMs in offspring. Given the role DNMs in disease risk, this negative result is good news for IVF and ICSI-TESE born children, if replicated in larger cohorts.
Study
EGAS00001005569
Action to Control Cardiovascular Risk in Diabetes (ACCORD) Clinical Trial
The Action to Control Cardiovascular Risk in Diabetes (ACCORD) clinical trial was a randomized, multicenter, double 2 x 2 factorial design study involving 10,251 middle-aged and older participants with type-2 diabetes who are at high risk for CVD events because of existing CVD or additional risk factors. The purpose of ACCORD was to determine if intensive glycemic control, intensive lipid management and intensive blood pressure control could prevent major cardiovascular events (myocardial infarction, stroke or cardiovascular death) in adults with type 2 diabetes mellitus. Secondary hypotheses included treatment differences in other cardiovascular outcomes, total mortality, microvascular outcomes, health-related quality of life and cost-effectiveness. The ACCORD trial failed to show a beneficial effect of intensive blood pressure or lipid therapy on the primary outcome, and intensive glycemia management actually increased mortality. The hypothesis underlying this ancillary study is that the failure of ACCORD to achieve its goal of reducing cardiovascular risk in diabetic patients through intensive management of hyperglycemia, dyslipidemia, and hypertension may be the result of variation in drug response due to genetic variation between individual participants. Benefits of intensive therapy may accrue to subsets of subjects with specific genetic variants predisposing to efficacious responses to particular therapeutic regimens, and harms may accrue to those with other variants predisposing to poor efficacy or adverse events. Identifying these variants could lead to a precision medicine approach to treating diabetes where each patient's genetic profile could identify the most efficacious treatment regimen with the lowest likelihood of adverse events. To test this hypothesis, a genome-wide genetic analysis was undertaken, incorporating both common variants distributed across the genome and rare variants targeted to exonic regions. Associations of genetic variants with short term responses to individual medicines as well as long term outcomes were investigated. The dataset is composed of genetic data from the ~6100 participants who agreed to participate in the ACCORD optional genetic studies and who allowed broad investigator access to their samples and the data derived from those samples, and from whom a DNA sample of sufficient quality was obtained. While a total of 8514 participants consented to the optional genetics studies, not all consented to broad investigator access, and those who did not are not included in this dataset, although they were also genotyped. Access to these additional genotypes can only be obtained by direct collaboration with the investigators of this study. Phenotype data used in the association analyses are derived from the ACCORD public release clinical data set, which has been made available through BioLINCC.
Study
phs001411
Add Health: Longitudinal Study of a Nationally Representative Sample of Adolescents in Grades 7-12 in the United States during the 1994-95 School Year, Followed into Adulthood with Five Interviews/Surveys in 1995, 1996, 2001-02, 2008, and 2016-18
The National Longitudinal Study of Adolescent to Adult Health [Add Health] is an ongoing longitudinal study of a nationally representative U.S. cohort of more than 20,000 adolescents in grades 7-12 (aged 12-19 years) in 1994 followed into adulthood with five interviews/surveys in 1995, 1996, 2001-02, 2008, and 2016-18. Add Health was designed to understand how social environments and behaviors in adolescence are linked to health and achievement outcomes in young adulthood. Add Health contains unprecedented environmental, behavioral, psychosocial, biological, and genetic data from early adolescence and into adulthood on a large, nationally representative cohort with unprecedented racial, ethnic, socioeconomic, and geographic diversity. Add Health has a large, multidisciplinary user base of over 50,000 researchers around the world who have published over 3,400 research articles. Add Health is housed at the Carolina Population Center of the University of North Carolina at Chapel Hill. Add Health datasets are distributed according to a tiered data disclosure plan designed to protect the data from the risk of direct and indirect disclosure of respondent identity. Add Health's large sample size, population diversity and rich longitudinal database of psychosocial, physical, and contextual data will permit investigation of an exceptionally broad range of phenotypes with known genetic variation. Prospective longitudinal measures are available to document change over time in each of these phenotypes, as well as change in the social environment and life experiences, making the Add Health sample ideal for understanding genetic linkages with health and behavior across the life course. The original design of Add Health included important features for understanding biological processes in health and developmental trajectories across the life course of young people, including an embedded genetic sample with more than 3,000 pairs of adolescents with varying biological resemblance (e.g., twins, full sibs, half sibs, and adolescents who grew up in the same household but have no biological relationship), testing of saliva and urine for sexually transmitted infections and HIV, and biomarkers of cardiovascular health, metabolic processes, immune function, renal function, and inflammation. Add Health therefore has critical objective indicators of health status and disease markers in young adulthood, well before chronic illness or its complications emerge in later adulthood. Because DNA has been collected on the full sample at Wave IV, it is possible to link genetic profiles with social, behavioral, and biological measures over time from adolescence into adulthood. Add Health sampled the multiple environments in which young people live their lives, including the family, peers, school, neighborhood, community, and relationship dyads, and provides independent and direct measurement of these environments over time. Add Health contains extensive longitudinal information on health-related behavior, including life histories of physical activity, involvement in risk behavior, substance use, sexual behavior, civic engagement, education, and multiple indicators of health status based on self-report (e.g., general health, chronic illness), direct measurement (e.g., overweight status and obesity), and biomarkers. No other data resource with this expanse of genotype and phenotype data on a large nationally representative longitudinal sample with race, ethnic, socioeconomic, and geographic diversity exists. A complete reference guide on study design and accomplishments can be found on the Add Health website: https://cdr.lib.unc.edu/concern/articles/6t053j27s.
Study
phs001367
GWAS in an Amerindian Ancestry Population Reveals Novel Systemic Lupus Erythematosus Risk Loci and the Role of European Admixture
The study comprises SLE patients recruited from specialist rheumatology clinics from the United States, Mexico, Argentina, Chile and Peru. Two strategies were used to enrich samples for Native American ancestry: a. previously genotyped Hispanic and Latin American and US Native American individuals were selected according to their ancestry based on 253 ancestry informative markers as part of the Lupus Large Association 2 Study. The approximate 35% of the samples were primarily of European and Native American admixture was selected; b. the recruitment of patients took place primarily in countries with little African admixture. All samples were genotyped on the HumanOmni1-Quad BeadChip using manifest H at the Oklahoma Medical Research Foundation. The Illumina clustering algorithm with GenomeStudio v2011.1 was used.
Study
phs001025
A Study of the Genetic Causes of Complex Pediatric Disorders
The Center for Applied Genomics (CAG) at the Children's Hospital of Philadelphia (CHOP) is a high-throughput, highly automated genotyping and sequencing facility equipped with state-of-the-art genotyping and sequencing platforms. Children who are treated at the Children's Hospital Healthcare Network and their parents may be eligible to take part in a major initiative to collect more than 100,000 blood samples, covering a wide range of pediatric diseases. A large majority of participants consenting to prospective genomic analyses also consent to analysis of their de-identified electronic medical records (EMRs). EMRs are longitudinal, with a mean duration of 6.5 years. CAG has committed to releasing genotype and phenotype data for 4000 individuals diagnosed with asthma, ADHD, atopic dermatitis, GERD (1000 for each), and 1000 individuals on the upper and lower ranges of Low-Density Lipoprotein (LDL) levels to dbGaP. We will also release genotype/phenotype of 3000 controls. Relevant phenotype data includes primary diagnoses (ICD9 codes), secondary diagnoses (ICD9 codes), medical procedures/tests conducted in relation to the phenotype, and a listing of relevant medications. Further details of CAG's research programs and capacity are available at: http://www.caglab.org
Study
phs000490
Spontaneous mutations in the single TTN gene represent high tumor mutation burden
Tumor mutation burden (TMB) is an emerging biomarker, whose calculation requires targeted sequencing of many genes. We investigated if the measurement of mutation counts within a single gene is representative of TMB. Whole exome sequencing (WES) data from the pan-cancer cohort (n=10,224) of TCGA, and targeted sequencing (tNGS) and TTN gene sequencing from 24 colorectal cancer samples (AMC cohort) were analyzed.
Study
EGAS00001004009
Whole Exome Sequencing of Bipolar cases, matched controls at Broad Inst on a cohort from Netherlands
This research project was a collaboration between VU and the Stanley Center at the Broad Institute. In this project we sequenced and analyzed the whole exomes of 943 Bipolar case/control samples from collaborators in the Netherlands. Genomic DNA from each sample was sequenced to a mean depth of 20x. The exome used Twist capture and the samples were sequenced on Illumina HiSeqX machines producing cram files
Dataset
EGAD50000000619