PROGRESS/ELEMENT DNA Methylation Study
An extension to the Early Life Exposures in Mexico to Environmental Toxicants (ELEMENT) birth cohort of Mexico City, the Programming Research in Obesity, GRowth, Environment and Social Stress (PROGRESS) Cohort is an ongoing longitudinal pre-birth cohort, established in 2006 in Mexico City, partnering Icahn School of Medicine at Mount Sinai with Harvard University and the National Institute of Public Health in Mexico, which was designed to study the effects of prenatal exposure to toxic metals, air pollution, phthalates, and stress on childhood development. Pregnant women of 18 years of age and older, pregnant for less than 20 weeks of gestation, had no documentation of heart or kidney disease, no use of steroids or anti-epilepsy drugs, no daily alcohol consumption, had telephone access, and planned to live in Mexico city for the following 3 years, and receiving care through the Mexican Social Security System were initially enrolled (n=1,054). In addition to clinical, demographic and exposure data collected, cord blood was collected to interrogate DNA methylation across the genome for over 300 mother-child dyads. Clinical assessments and exposures were captured during several life stages, including prenatal, infant (0-1 year), youth (1-18 years), and adulthood (mother). The PROGRESS cohort added well-documented phenotyping of children for obesity, metabolic dysfunction, respiratory outcomes, and cardiovascular outcomes, as well as measures of air pollutant, personal care/consumer product, non-chemical stress, and metal mixture exposures. No clinical trials were conducted in this cohort. The data collected in this study should provide a unique resource to investigate DNA methylation as it relates to several environmental exposures and adverse cardiometabolic and neurocognitive health in mothers and children from a prospective birthing cohort. For access to demographic, clinical, and exposure data please directly contact study principal investigators.
Study
phs002754
This dataset contains fastq and BAM data from female adipose tissue.
Here we have from 64 samples, their corresponding fastq and bam files.
The study group consisted of 17 obese women with normal glucose tolerance and 15 obese women with T2DM classified according to WHO standards. The groups were matched for age, BMI and waist circumference. All the women had been morbidly obese (BMI>40 kg/m2) for at least five years.
Dataset
EGAD00001002202
Liquid biopsy for molecular characterization of diffuse large B-cell lymphoma and early assessment of minimal residual disease
Circulating tumor DNA (ctDNA) allows genotyping and minimal residual disease (MRD) detection in lymphomas. Using a NGS approach (Euroclonality-NDC), we evaluated clinical and prognostic value of ctDNA in a series of R-CHOP-treated DLBCL patients at baseline (n=68) and after 2-cycles (n=59), monitored by metabolic imaging (PET/CT).
A molecular marker was identified in 61/68 (90%) ctDNA samples at diagnosis. Pre-treatment high ctDNA levels significantly correlated with elevated LDH, advanced stage, high risk IPI and a trend to shorter 2-year PFS. Valuable NGS data after 2-cycles of treatment were obtained in 44 cases, and 38 achieved major molecular response (MMR; 2.5-log drop in ctDNA). PFS curves displayed statistically significant differences among those achieving MMR vs. those not achieving MMR (2yr PFS of 76% vs. 0%, p<0.001). Similarly, more than 66% reduction in SUVmax by PET/CT identified two subgroups with different prognosis (2yr PFS of 83% vs. 38%; p<0.001). Combining both approaches MMR and SUVmax reduction, a better stratification was observed (2yr PFS of 84% vs. 17% vs. 0%, p<0.001).
Euroclonality-NDC panel allows the detection of a molecular marker in the ctDNA in 90% of DLBCL. ctDNA reduction at 2 cycles and its combination with interim PET results improves patient prognosis stratification.
Study
EGAS50000000215
MARS-seq dataset of five obese human subjects and a lean human subject
Biopsies from visceral adipose tissue from the omental depot (OAT) were obtained from five obese individuals and one lean donor with participant informed consent obtained after the nature and possible consequences of the studies were explained under protocols approved by the Institutional Review Boards of the Perelman School of Medicine at the University of Pennsylvania, the Children’s Hospital of Philadelphia, or the Tel Aviv Sourasky Medical Center. The obese donors underwent bariatric surgery, the lean donor underwent cholecystectomy. OAT samples were placed in 1 mL of DMEM, and finely minced under sterile conditions before digestion in 50 mL of DMEM with 3 mg/1 mL collagenase IV (Gibco). Samples were incubated at 37°C in a rotating oven for 20-60 min. Adipocyte and stromal vascular fractions (SVF) were separated by centrifugation, and red blood cells (RBCs) were removed from the SVF by histopaque gradient (Sigma). Single-cell RNA-sequencing libraries were prepared using the MARS-seq pipeline, and sequenced on the MiSeq 500 or HiSeq 2500 Sequencing System (Illumina).
Dataset
EGAD00001005100
Early Family Prevention of Adolescent Alcohol, Drug Use, and Psychopathology
The Early Steps Multisite Study is comprised of researchers from the University of Virginia, the University of Pittsburgh, Arizona State University, and Oregon Research Institute. This longitudinal study has been funded by the National Institute on Drug Abuse at the National Institutes of Health since 2002. The Early Steps Multisite Study conducted a randomized control trial to examine the effects of an intervention program called the Family Check-Up (FCU) offered in early to middle childhood. Outcomes include problem behaviors including substance use. Primary caregivers (PC) and their children (TC) were recruited from Women, Infant, and Children's (WIC) Nutritional Supplement centers in and around Pittsburgh, PA, Eugene, OR and Charlottesville, VA when target participating children were age 2. Participants were screened in three key areas of risk for later child conduct problems: (1) sociodemographic risk (e.g., poverty, teen parent status), (2) family risk (e.g. maternal stress, depressive symptoms), and (3) child conduct problems. Randomization to the intervention condition was balanced on gender to assure an equal number of males and females in the control and intervention groups. Data submitted to dbGaP are from the 515 subjects who were consented to provide a saliva sample.
Study
phs003442
National Institute on Aging - Late Onset Alzheimer's Disease Family Study: Genome-Wide Association Study for Susceptibility Loci
Alzheimer disease is the most common neurodegenerative disorder of the elderly affecting an estimated five million Americans. Genetic factors contribute to the risk for disease with heritability estimates ranging from 57% to 79%. More than a decade ago, the ε4 variant of APOE was identified and remains the most consistently replicated genetic variant influencing the risk of late onset Alzheimer disease. A segregation analysis suggests there may be four additional genes influencing the age-at-onset of Alzheimer disease. In 2007 there were 968 association studies in 398 candidate genes reported, but none replicated consistently. There are many reasons for the lack of consistency, but one important reason for the lack of progress is the paucity of a sufficient number of well characterized families and patients available to the entire scientific community. The extensive effort and expense required to ascertain such a population has been addressed by the NIA-LOAD Family Study. Its goal is to identify and recruit families with two or more siblings with the late-onset form of Alzheimer's disease and a cohort of unrelated, non-demented controls similar in age and ethnic background, and to make the samples, the clinical and genotyping data and preliminary analyses available to qualified investigators world-wide. Genotyping by the Center for Inherited Disease Research (CIDR) was performed using the Illumina Infinium II assay protocol with hybridization to Illumina Human 610Quadv1_B Beadchips. This genotyping represents the largest collection of families ever assembled with Alzheimer's disease combining the NIA-LOAD Genetics Initiative Multiplex Family Study, the National Cell Repository for Alzheimer's Disease (NCRAD) with additional controls from the University of Kentucky. These genotyping results will serve as a focal point for future research that will identify all of the remaining genetic variants in Alzheimer's disease.
Study
phs000168
SureTypeSC - accurate genotyping of single-cell SNP array data
We used a collection of single cells from two Coriell cell lines to train and validate a machine learning method for quality assessment of the single cell genotypes
Study
EGAS00001004621
National Heart, Lung, and Blood Institute (NHLBI) Heart Healthy Lenoir (HHL) Genomics Study
The HHL genomics study uses a systems approach to develop models integrating clinical and genomic data. Previously we developed and tested an approach known as the SAMARA (Supporting A Multidisciplinary Approach to Research in Atherosclerosis) project that applied recent advances in biomedical and computational sciences at The University of North Carolina at Chapel Hill to develop a deeper understanding of human cardiovascular disease (CVD). The Heart-Healthy Lenoir Project expands these studies into the community, using this methodology to: 1) determine the prevalence of genomic risk signatures in high-risk community populations using genome-wide Single Nucleotide Polymorphism (SNP) analysis; 2) develop novel genomic models incorporating high-risk features in this population; and 3) determine whether genomic signatures can be used to predict responsiveness to interventions that underlie CVD disparities. DNA was obtained from participants enrolled in two of the HHL clinical trials, 1) Improving Care for Patients With High Blood Pressure (NCT01425515) or 2) Heart-Healthy Lenoir Lifestyle Study (NCT01433484). Participants could enroll in both trials concurrently.
Study
phs001471
A Genome-Wide Association Study on Cataract and HDL in the Personalized Medicine Research Project Cohort
The primary goals of this project are to develop and validate electronic phenotyping algorithms, to accurately identify cases and controls while maintaining a positive predictive value (PPV) of >95%, and to conduct a genome wide association study that advances understanding of two specific yet interrelated disease states, while simultaneously engaging the community in these research efforts. Lipid abnormalities and cataracts are both diseases of public health significance, they share common risk factors, and they are both complex diseases which likely have many genes contributing to disease development. Whole genome association studies with these two outcomes and environmental risk factors could yield novel data about the etiology of the two separate outcomes as well as their interaction. PhenX is a project designed to prioritize Phenotypes and eXposure measures for Genome-wide Association Studies (GWAS). The PhenX Toolkit is a valuable resource for researchers who are planning or expanding a study and would like to incorporate well established measures that have been recommended by experts in the field. We are currently interested in gene-environment interactions for "age related cataract".
Study
phs000170
Esophageal Adenocarcinoma Organoid Genomics
Single cell RNA-seq (scRNA-seq) of esophageal adenocarcinoma organoids to benchmark variant calling from 10X Genomics scRNA-seq data. Five EAC organoids were subjected to scRNA-seq and two included matched exome data.
Study
EGAS00001005224
National Institute of Neurological Disorders and Stroke (NINDS), Family Study of Essential Tremor (FASET), Identification of Susceptibility Genes for Essential Tremor
The Familial Study of Essential Tremor (FASET) was designed to identify susceptibility genes for Essential Tremor. ET is among the most common neurological diseases with a prevalence (age > 40 years) estimated to be 4.0% and prevalence in advanced age (> 90 years) exceeding 20%. ET, often referred to as "familial tremor", is generally regarded as a highly genetic disorder with families, with affected members over multiple generations, and twin studies show high concordance among monozygotic twins. Probands (affected with ET) and relatives were enrolled in a family study of ET at Columbia University, New York between 2011 - 2014. The study was approved by the Institutional Review Board at Columbia University and written informed consent was obtained from all enrollees. The criteria for enrollment were: 1) the proband had early-onset ET with age at onset < 50 years, 2) the proband had a diagnosis of definite or probable ET, 3) in addition to the proband, there were at least two affected relatives in the family, 4) additional affected and unaffected family members were willing to participate in the study, and 5) the families contained more than two affected individuals in different generations. Blood samples were also collected for genetic research. For the genetic analyses, we excluded enrollees that we or others had diagnosed with Parkinson's disease (PD) or dystonia. The final sample includes 52 families (52 probands [affected with ET]) and 155 relatives). The number of affected individuals enrolled per family ranged from 3 - 7 (mean = 4.1). Genetic samples from FASET were analyzed with whole genome SNP genotyping (for linkage analyses) and whole exome sequencing. It is hoped that this resource will better help researchers to understand the genetic causes of ET and underlying disease pathogenesis.
Study
phs000966
Centers for Common Disease Genomics (CCDG) - Whole Genome Sequencing in Type 1 Diabetes (T1DGC)
The Type 1 Diabetes Genetics Consortium (T1DGC) was established to collect resources (biological samples and data) and conduct research to better understand the genetic basis of type 1 diabetes (T1D). Collection was initiated by ascertaining affected sib-pair families (both parents, two affected siblings and, when available, an unaffected sibling), collected from five geographic regions through four recruitment networks (Asia-Pacific, Europe, North America, United Kingdom). In addition, the T1DGC collected trio families (both parents and affected child) and cases and controls from low-prevalence populations (African-American, with four grandparents self-reporting as African ancestry; Mexican-American, with four grandparents self-reporting as ancestry from Mexico). The T1DGC also served as a repository for contributed collections from other studies, all meeting the broad data-sharing policy of the T1DGC, for inclusion in the genetic studies. These collections include T1D case samples ascertained from the UK Genetic Resource Investigating Diabetes (UK GRID) cohort, SEARCH for Diabetes in Youth (SEARCH), The Genetics of Kidneys in Diabetes (GoKinD), and control samples obtained from the British 1958 Birth Cohort, the UK National Blood Services collection, CLEAR (Consortium for the Longitudinal Evaluation of African-Americans with Early Rheumatoid Arthritis), the New York Cancer Project (NYCP), and other cohorts. For the NHGRI-funded Centers for Common Disease Genomics (CCDG) project, participants with T1D and ancestry-matched controls were identified through the T1DGC, either through direct ascertainment or by contribution from other sources to the T1DGC. As the CCDG has focused initially on non-Caucasian populations for whole genome sequencing, T1DGC participants of African, Mexican and Asian ancestry (targeting ~1200 cases and ~1200 controls in each ancestral group) and a small group of participants of Northern European ancestry (~100 cases, ~100 controls) were to be contributed to the study. Whole genome sequencing of T1DGC samples would be conducted at Washington University McDonnell Genome Institute and based upon matching case-control status within an ancestry group and prioritization by the CCDG.
Study
phs001222
Paired exome analysis in urothelial carcinoma
In this study we characterized genomic alterations in two to five metachronous tumors from 29 patients initially diagnosed with early stage disease. Fourteen patients (32 tumors) had non progressive disease (NPD) and 15 patients (34 tumors) had progressive disease (PD). Whole exome sequencing (WES, ~50x mean read depth), Ultra deep targeted sequencing (~6,809x mean read depth) and whole transcriptome RNA-seq was performed for all samples. In addition multiregional WES was performed on 8 adjacent regions from a single tumor.
Study
EGAS00001001686
Transgenerational transmission of reproductive and metabolic dysfunction in the male progeny of polycystic ovary syndrome
The transgenerational maternal effects of PCOS in female progeny have been revealed. As there are evidence that a male equivalent of PCOS may exist, we asked whether sons born to mother with PCOS (PCOS-sons) transmit reproductive and metabolic phenotypes to their male progeny. Here, in a Swedish nationwide register-based cohort and a clinical case-control study from Chile we found that PCOS-sons are more often obese and dyslipidemic. Their serum miRNAs are found to potentially regulate PCOS-risk genes. Our prenatal androgenized PCOS-like mouse model with or without diet-induced obesity confirmed that reproductive and metabolic dysfunctions in F1 male offspring are passed down to F3. Small non-coding RNAs (sncRNAs) sequencing of F1-F3 sperm revealed distinct differentially expressed (DE) sncRNAs across generations in the androgenized, obese, and obese and androgenized lineages, respectively. Notably, common targets between transgenerational DEsncRNAs in mouse sperm and in PCOS-sons serum indicate similar effects of maternal hyperandrogenism. These findings strengthen the translational relevance highlighting a previously underappreciated risk of reproductive and metabolic dysfunction via the male germline transmission and potential molecular markers to study in future generations.
Study
EGAS00001007079
Evaluation of Nuclear DNA from Rootless Hairs for Forensic Purposes
The study aims to overcome current limitations in the recovery of DNA from small, difficult forensic samples. Particularly, our goal is to produce a robust laboratory protocol and the accompanying software to accelerate adoption, as well as to evaluate the reliability and robustness of both the laboratory and computational aspects of generating genotype files from minute and/or degraded DNA samples, such as single, rootless hairs.The data accompanying this study includes raw, paired-end reads from high-throughput sequencing of two panels of saliva, head hair, and pubic hair samples collected from anonymous volunteers at the University of California, Santa Cruz. The smaller set (Hair1.0) comprises 8 individuals, while the larger (Hair2.0) comprises 50 individuals, with 3 overlapping individuals between the two panels identified in post-collection analysis. We did not collect phenotype data or personally identifying information from the participants. For the Hair2.0 panel, only a subset of volunteers provided pubic hairs for DNA extraction and sequencing. Also included are saliva-derived genotype array data for all 8 individuals in the Hair1.0 panel and 44 of 50 individuals in the Hair2.0 panel.
Study
phs002979
Impact of Respiratory Virus Infections and Bacterial Microbiome Shifts on Lymphocyte and Respiratory Function in Infants Born Prematurely or Full Term
Public health importance: Babies born preterm, approximately 1 out of every 9 live births in the United States, have significant respiratory morbidity over the first two years of life, exacerbated by respiratory viral infections. Many (<50%) return to pediatricians, emergency rooms and pulmonologists with symptoms of respiratory dysfunction (SRD): intermittent or chronic wheezing, poor growth and an excess of upper and lower respiratory tract infections (LRTI). SRD correlate inversely with gestational age and weight at birth and is more common in those with chronic lung disease of prematurity, yet its incidence and severity varies widely among both the prematurely born and those born at term. There is evidence from clinical studies and animal models that risks of LRTI and recurrent wheezing is influenced by gut and respiratory flora and by T cell responses to infection. Information gained from this study will be used to identify characteristics, risk factors and potential mechanisms for early and persistent lung disease in children born at term and born preterm. This Clinical Research Study will investigate the relationships between sequential respiratory viral infections, patterns of intestinal and respiratory bacterial colonization, and adaptive cellular immune phenotypes which are associated with increased susceptibility to respiratory infections and long term respiratory morbidity in preterm and full term infants. We hypothesize that the timing and acquisition of specific viral infections and bacterial species are directly related to respiratory morbidity in the first year of life as defined by SRD and by measures of pulmonary function. We hypothesize that cellular and molecular immuno-maturity are altered due to factors presented by premature birth in such a way as to promote chronic inflammatory and cytotoxic damage to the lung, with subsequent enhanced, damaging responses to infectious agents and environmental irritants. Our preliminary studies demonstrate both feasibility and expertise in mutiparameter immunophenotyping of small volume peripheral blood samples obtained from premature infants including gene expression arrays of flow cytometry sorted cells. We will use new technologies for known viral identification, as well as high-throughput metagenome sequencing of RNA and DNA virus like particles (VLP) to be used for viral discovery in infant respiratory sample and use of high-throughput pyrosequencing (454T) of bacterial 16S rRNA to determine shifts in bacterial community structure, occurring in pre-term (PT) as compared to full term (FT) infants, over the first year of life. Finally, we present statistical approaches to stratify disease risk predictors using multivariate logistic regression modeling approaches. We propose to evaluate T cell phenotypic and functional profiles relative to viral and predominant bacterial exposures according to highly complementary, but independent, Specific Objectives. Objective 1: To determine if viral respiratory infections and patterns of respiratory and gut bacterial community structure (microbiome) in prematurely born babies predict the rate and degree of immunologic maturation, and pulmonary dysfunction, measured from birth to 36 weeks corrected gestational age (CGA). Objective 2: To determine the relationship between respiratory viral infections and disease severity up to one year CGA, and the lymphocyte (Lc) phenotypes documented at term gestation (birth for term infants and 36 wks/NICU discharge in preterm infants) and at one year CGA. Three secondary outcomes of this objective will be to a) relate the quantity, type and severity of viral infections with pulmonary function at one and three years of life, b) relate the viral community structure to severity of viral infections and c) to seek evidence of modulation of viral susceptibility by bacterial respiratory and gut community structure (microbiome). The relationship of colonization with known and non-identified bacterial species in both the respiratory tract and the gut will be evaluated. Flow cytometry data corresponding to this study can be found within Immport study SDY1302. Positive and negative controls for microbiome samples are uploaded under SRA bioproject PRJNA474485. Microbiome samples corresponding to PRISM2 are distinguished from PRISM1 via "_PRISM2" appended to the sample name. Within the positive and negative controls, PRISM1 controls are uploaded as bam files and PRISM2 controls are uploaded as paired fastq. Samples ending in -08 correspond to TLDA qPCR results for a given sample. There is a column for each pathogen tested and a column to indicate where that pathogen was bacteria or virus.
Study
phs001347
Cincinnati Children's Hospital Medical Center (CCHMC) - eMERGE Phase IIIA Data
This submission includes genotyping or sequencing data from separate cohorts, each is described in separate paragraphs below. Extreme early onset obesity Obesity is a serious epidemic condition and on the rise in the United States. Today, nearly one out of three children is overweight or obese in this country. According to the Center for Disease Control, 35.7% of American adults and 17% of American children are obese. The medical costs associated with obesity are estimated to be in the billions. Without a doubt, interplay of additive genetic effects and common environmental effects influence this complex disease. However, despite being exposed to so-called "obesogenic environment", a large proportion of the population remains of normal weight. These observations suggest that innate, non-environmental, factors make some individuals more susceptible to obesity providing support for biological mechanisms, and thus genetic factors, to underlie the individual's response to the obesogenic environment. In young children with severe obesity the relative role of genetics and in utero programming are likely to outweigh the short duration of environmental and lifestyle exposures. This group is therefore an ideal one to study as they are likely enriched for variants that influence the risk of developing obesity. The purpose of this project is to further study and understand obesity in childhood and to develop a repository of samples for future studies into obesity. Eosinophilic Esophagitis (EoE) Eosinophilic Esophagitis (EoE) is one of the manifestations of eosinophilic gastrointestinal inflammation which have profound effects on a patient's health and development. Results of epidemiologic studies performed through our center demonstrate that eosinophil-associated gastrointestinal disease is not an uncommon entity. While the epidemiology of eosinophilic esophagitis has not been thoroughly studied until recently, there appears to be a significant increase in the diagnosis of EoE in the last decade. Based on our research, this mainly reflects increased disease recognition, but there is also a bona-fide increase in disease incidence which coincides with the increasing incidence of asthma and allergic diseases in the industrialized world. In addition, many patients with intractable symptoms thought in the past to represent atypical GERD or other disorders are now being recognized as having EoE. Diagnosis of EoE requires endoscopy and biopsies to document the characteristic histologic findings of esophageal eosinophilia. In general, this study proposed to elucidate the mechanisms underlying eosinophil growth, survival, migration, and function, and to investigate and further characterize the pathophysiology of, clinical manifestations of, and spectrum of disease severity of eosinophilic esophagitis in humans. The de-identified genotyping and genome wide association data generated as part of this research will be used for further genome research. Familial Sample Repository (FSR) and Directed Sample Repository (DSR) De novo mutations could cause many diseases, which has been demonstrated in mental retardation, autism and many rare genetic disorders. Family-based studies have a variety of advantages over case/control studies, including the elimination of analysis artifacts related to population stratification, the detection of genes that act through a recessive mechanism of inheritance and validation that the trait is not transmitted from a parent, something not possible using a case/control design. Additionally, DNA from families can be used to identify de novo mutations suggesting strong candidate causal polymorphisms. For this project, samples will be collected from families on an on-going basis. Families may be recruited because the patient either has a disease which is thought to be of genetic origin or from the general patient population to serve as controls or future identified diseases. Some phenotypes under study include fibroblastic rheumatism, diaphragmatic hernia, polymicrogyria, severe congenital neutropenia, primary sclerosing cholangitis and staph infection. CLRR-Cincinnati Lupus Registry and Repository Systemic lupus erythematosus (SLE) is a complex, partially understood autoimmune disorder. Genetic origins for SLE are supported by high heritability (> 66%), familial aggregation, increased monozygotic twin concordance, genetic linkages, and candidate gene genetic association, including HLA genes, Fc receptors, and complement components. Relevant environmental factors likely include infections (Epstein-Barr virus), therapeutics, personal habits (smoking), and diet. To continue a research resource facility for collection of well-characterized pedigrees containing a proband with systemic lupus erythematosus we develop this repository. Juvenile Idiopathic Arthritis (JIA) Juvenile Idiopathic Arthritis (JIA) is a debilitating complex genetic disorder characterized by inflammation of the joints and other tissues and shares histopathological features with other autoimmune diseases. It is considered complex genetic traits. There are more than 50,000 children with JIA in the USA, approximately 1 per 1000 births, which is about the same incidence as juvenile diabetes. It is believed that genes in the major histocompatibility complex (MHC) play a role in defining genetic risk, and it can be hypothesized that loci in other chromosomal regions are involved in conferring risk in JIA. These candidate chromosomal regions can be identified using genome-wide association analyses. The long-term goal is a comprehensive understanding of the genetic basis of these disabling arthropathies for which the molecular basis is not presently understood. These data will contribute to a national resource for the study of autoimmunity in children. Better Outcomes for Children-Cytogenetics Since 2007, more than 4000 samples, enriched with various rare or common genetic diseases as well as specific chromosomal abnormalities such as deletions and duplications have been genotyped for the purpose of subsequent GWAS and Phewas analyses and uncovering main genetic effects.
Study
phs001011
BLUEPRINT DNA Methylation 450K data of mantle cell lymphoma
We have applied an analytic strategy to decipher the DNA methylome of 86 mantle cell lymphomas (MCL) in light of the methylome of the entire B-cell lineage. In this way, we first identified two MCL subgroups that respectively carry epigenetic imprints of pre- and post-germinal center B cells. Secondly, we observed that pure tumor-specific changes are rare, as most (89-99%) DNA methylation alterations in MCL are within or in close proximity of those regions showing dynamic methylation in normal B cells. Several thousand of these differentially methylated regions in MCL show concurrent changes in enhancer-associated histone modifications, including a region located 650 Kb away from the MCL oncogene SOX11. At the clinical level, epigenetic and genetic changes co-evolve during MCL progression and the magnitude of epigenetic changes is associated with overall survival of the patients.
Study
EGAS00001001637
Ecological Stressors, PTSD, and Drug Use in Detroit: The Detroit Neighborhood Health Study (DNHS)
The Detroit Neighborhood Health Study (DNHS) is a prospective, representative longitudinal cohort study of predominantly African American adults living in Detroit, Michigan. The overall goal of the DNHS is to identify how genetic variation, lifetime experience of stressful and traumatic events, and features of the neighborhood environment predict psychopathology and behavior. Cohort participants were selected with a dual-frame probability design, using telephone numbers obtained from the U.S. Postal Service Delivery Sequence Files as well as a listed-assisted random-digit-dial frame. Individuals without listed landlines or telephones and individuals with only a cell phone listed were invited to participate through a postal mail effort. Participants completed a 40 minute, structured telephone interview annually between 2008-2012 to assess perceptions of participants' neighborhoods, mental and physical health status, social support, exposure to traumatic events, and alcohol and tobacco use; each participant was compensated $25USD. All survey participants were offered the opportunity to provide a specimen (venipuncture, blood spot, or saliva) for immune and inflammatory marker testing as well as genetic testing of DNA. Participants received an additional $25USD if they elected to give a sample. Informed consent was obtained at the beginning of each interview and again at specimen collection. The Institutional Review Board of the University of Michigan reviewed and approved the study protocol. The DNHS submission to dbGaP includes phenotype data from all five survey waves (n=856), all available GWAS data for participants who completed wave 4 (n=507), and methylation data for wave 1, wave 2, wave 4, and wave 5 participants (n = 456).
Study
phs000560
TMM whole genome analysis of 4566 Japanese individuals
Tohoku University Tohoku Medical Megabank Organization (ToMMo) and Iwate Tohoku Medical Megabank Organization (IMM) were founded to establish an advanced medical system to foster the reconstruction from the Great East Japan Earthquake. These organizations are developing a biobank that includes medical and genome information for supporting health and welfare in the Tohoku area. In the first stage, the part of our mission was to sequence the 4,000 individuals to construct Japanese whole-genome reference panel.
Study
JGAS000239
Identification of putative multiple myeloma (MM) susceptibility genes
We sought to identify novel MM susceptibility genes using a collection of families with multiple cases of MM/MGUS, including 189 affected individuals from 40 families, and index cases from an additional 88 families, along with 170 early-onset (EO) MM cases (≤ 55 years). We analyzed a total of 347 affected individuals using whole exome (N=321) and whole genome (N=26) sequencing. Samples were identified and collected through nation-wide efforts in France, Sweden and Greece. We focused on rare (MAF<0.5%) germline protein truncating and likely deleterious missense variants in genes harboring variants in at least two families showing variant-disease segregation, and in additional index (≥2) and/or early-onset (≥2) cases.
Study
EGAS50000001259
The EGA Helpdesk team: 2025 in review and what we are building next
Behind every dataset submitted, every access request processed, and every
technical question answered, there is a team working quietly to keep things
moving: the Helpdesk (HD). With the onset of 2026, this feels like the right
moment to look back on what 2025 has been like for the HD team: the challenges
we faced, how we adapted, and where we're heading next.
Why the Helpdesk matters to EGA
The EGA Helpdesk is more than a support channel. It plays a key role in
maintaining trust in the EGA ecosystem. By supporting data submitters,
researchers, Data Access Committees (DACs), and institutional partners, the HD
helps ensure that data can flow securely, efficiently, and reliably. When
issues arise, the Helpdesk is often the first place where their impact is felt
and addressed. In that sense, the HD sits at the intersection of technology,
policy, and people.
One Helpdesk, two locations, one shared mission
The EGA Helpdesk is a joint, distributed team working closely across two
locations: CRG (Barcelona, Spain) and EMBL-EBI (Cambridge, UK). Although we
are based in different institutions, we operate as a single Helpdesk, with
shared workflows, priorities, and responsibility towards users.
At CRG, the HD team is formed by:
Andrea
Max
Àlex
and me
At EMBL-EBI, we work closely with:
Silvia
Coline
Aravind
What defines us as a team is simple: we work user-first, even under pressure.
In a highly technical environment, clarity, empathy, and consistency matter
just as much as tools and processes. A close collaboration across sites is
essential to making that happen.
What does the EGA Helpdesk do?
The HD supports users across the full lifecycle of data in EGA. This includes:
Data submissions, uploads, and encryption workflows.
Data access requests and permissions.
Questions around policies, consent, and data usage.
Technical and system-related issues.
Coordination between users, internal teams, and external partners.
2025: growth, change, and recalibration
2025 was a year of growth, but not always a predictable one.
Early in the year, several technical and system-related challenges required us
to adjust our original plans. Priorities shifted, timelines changed, and some
improvements had to be rethought. For the HD team, this is often the hardest
part of the job: we see delays through the eyes of users and understand the
real impact they can have on ongoing research.
One of the key lessons from 2025 was that stability is not only a technical
challenge, but also an organisational one. Teamwork proved to be essential:
anticipating peak periods, sharing context early, and coordinating closely
across teams made a tangible difference. When things became complex, working
together across roles and locations was what allowed us to keep moving
forward. In 2025, the Helpdesk received 5.313 tickets and resolved 5.511
requests, reflecting both increased adoption of EGA and the team’s ability to
absorb higher demand.
At the same time, demand continued to grow. Compared to 2024, ticket creation
increased by over 6%, while resolution capacity grew by more than 11%. The
team not only kept up with incoming requests but also resolved part of the
accumulated backlog, finishing the year having solved more tickets than were
created.
The real challenge of 2025 was not overall performance, but how the workload
was concentrated during peak months. Seasonality and demand spikes placed
pressure on the system, even while overall efficiency remained strong.
On a team level, 2025 was also a year of transition. I joined the HD
leadership role in January 2025, stepping into a period of change and rapid
learning. Later in the year, in October, we said goodbye to Raül, and in
January 2026, we welcomed Àlex, strengthening the team for the next phase.
What users needed most in 2025
While requests vary widely, some themes stood out throughout the year:
Support with data submissions and uploads
Data access requests and permissions
Technical and system-related issues
As EGA matures, day-to-day operations have become more complex. Many
long-running tickets are not delayed due to a lack of follow-up, but because
they depend on external approvals, cross-institutional coordination, or
multi-step processes. Understanding these patterns helps us focus not just on
resolving tickets, but on improving how work flows through the system.
Looking ahead to 2026
With a reinforced team and clearer insights from 2025, our focus for 2026
shifts from throughput to flow.
Key priorities include:
Strengthening our web content and documentation
Reducing structural backlog
Improving cross-team and cross-system coordination
Anticipating peak demand earlier and planning capacity accordingly
Challenges will continue to arise in 2026, as they always do. However, 2025
reinforced something important: a stable, empathetic, and well-aligned
Helpdesk team is essential to supporting EGA's mission at scale. Supporting
users well means supporting research, and that remains at the core of what we
do.
Blog
ega-helpdesk-team-2025-in-review-and-upcoming-improvements
NIDDK IBD Genetics Consortium Crohn's Disease Genome-Wide Association Study
This dataset contains data from a genome-wide association study performed with 968 Inflammatory Bowel Disease (IBD) affected cases and 995 unrelated controls using the Illumina HumanHap300 Genotyping BeadChip. Cases were selected to have Crohn's disease with ileal involvement, and controls were matched to cases based on sex and year of birth. Subjects were drawn from two cohorts: (1) persons with non-Jewish, European ancestry (561 cases and 563 controls), and (2) persons with Jewish ancestry (407 cases and 432 controls). Genotyping was performed at the Feinstein Institute for Medical Research. Seven-hundred fifty-four of the samples (468 cases and 286 controls) were taken from the NIDDK IBD Genetics Consortium cell line repository. These samples are identified in the IBD_Sample file. The subject IDs for these individuals may be used to request corresponding samples for follow-up research through the repository. In addition, complete phenotype data for these individuals are available, together with the Consortium's phenotyping manual and the forms used to collect the data. The remaining 1,209 samples were obtained from pre-existing collections ascertained through Cedars-Sinai Medical Center, Johns Hopkins University, University of Chicago, University of Montreal, University of Pittsburgh, University of Toronto, and the New York Health project (controls only). For these samples, only sex, cohort (Jewish vs. non-Jewish), and age at diagnosis (cases only) are available. Two-hundred three individuals from among the pre-existing samples did not provide consent to release their genotype data (designated as consent group 2 in the file IBD_Subject). Thus, individual genotype data are only provided for 1,760 samples. To compensate for this, we have provided summary results for each SNP. These are based on a stratified analysis testing case/control association. Fifty-one samples had a call rate less than 93% and were therefore excluded from this analysis, leaving an overall sample size of 1,963 - 51 = 1,912. X Chromosome Heterozygosity Nine samples have X chromosome heterozygosity that is neither consistent nor inconsistent with their phenotypic sex. One of these samples was found to have Turner Syndrome. The remaining 8 samples have heterozygosity ranging from 35-76%.
Study
phs000130
McGill Epigenomics Mapping Centre
The McGill Epigenomics Mapping Centre (EMC) and Data Coordination Centre (EDCC) were established in 2012 at the McGill University and Genome Quebec Innovation Centre (Montreal, Canada) to support large-scale human epigenome mapping for a broad spectrum of cell types and diseases. Previous epigenetic studies have provided profiles of specific chromatin marks in relation to basic biological processes, and future studies of molecular mechanisms must build upon these proofs-of-concept by integrating sequence-based variation with multiple levels of epigenetic and transcriptional regulation across the genome of human tissues and animal disease models. This project leverages the high-throughput sequencing infrastructure and expertise in genomics and transcriptomics at the McGill Innovation Centre to carry out this research. Controlled access to the data by collaborators and the greater scientific community is gained via a portal, which takes advantage of Compute Canada high-performance computing cluster resources to manage the large volume of data associated from the generation of reference epigenome maps. McGill University is one of two Canadian mapping and data coordination centres, the other based at the Michael Smith Genome Sciences Centre in Vancouver, British Columbia. The generation of comprehensive epigenome maps at McGill University is part of a larger international effort that is coordinated by the International Human Epigenome Consortium (IHEC), whose overall long-term objective is to determine the extent to which the epigenome shapes human populations over generations and in response to the environment. This project is funded under the Canadian Epigenetics, Environment, and Health Research Consortium (CEEHRC) by the Canadian Institutes of Health Research and by Genome Quebec, with additional support from Genome Canada. The computing and networking infrastructure, and part of the software development, are provided by Compute Canada and CANARIE.
Study
EGAS00001000995
The National Heart, Lung, and Blood Institute (NHLBI)-funded Next Generation Genetic Association Studies (NextGen) Consortium: Phenotyping Lipid traits in iPS derived hepatocytes Study (PhLiPS Study)
The goal of the PhLiPS study is to create a library of induced pluripotent stem cell (iPSC) lines and iPSC-derived hepatocytes of diverse genotypes for use in metabolic profiling and interrogating lipid phenotypes. These cell lines were created as a part of the Next Generation Genetic Association Studies (Next Gen) Program, which was a five-year, $80 million program to investigate functional genetic variation in humans by assessing cellular profiles that are surrogates for disease phenotypes. To achieve this, researchers from multiple institutions across the U.S. were awarded grants to derive iPSC lines from more than 1,500 individuals representing various conditions as well as healthy controls for use in functional genomic ("disease in a dish") research. This extensive panel includes a diverse set of age, gender, and ethnic backgrounds, and therefore will be an invaluable tool for evaluations across demographics. Further enhancing the utility of these cell lines are data sets such as phenotyping, GWAS, genome sequencing, gene expression and -omics analyses (e.g., lipidomic, proteomic, methylomic) that can be matched to the cell lines. The PhLiPS Study focuses on individuals free of cardiovascular disease or with lipoprotein metabolism disorders in the community served by the Hospital of the University of Pennsylvania.
Study
phs001341