Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Coronary Artery Risk Development in Young Adults Study (CARDIA)
Cohort Description The Coronary Artery Risk Development in Young Adults (CARDIA) study is a study examining the development and determinants of clinical and subclinical cardiovascular disease and their risk factors. It began in 1985-6 with a group of 5115 black and white men and women aged 18-30 years. The participants were selected so that there would be approximately the same number of people in subgroups of race, gender, education (high school or less and more than high school), and age (18-24 and 25-30 years) in each of 4 centers: Birmingham, AL; Chicago, IL; Minneapolis, MN; and Oakland, CA. These same participants were asked to participate in follow-up examinations during 1987-1988 (Year 2), 1990-1991 (Year 5), 1992-1993 (Year 7), 1995-1996 (Year 10), 2000-2001 (Year 15), 2005-2006 (Year 20), 2010-2011 (Year 25), 2015-2016 (Year 30), and 2021-2022 (Year 35). Data Being Submitted Wave 1 questionnaire data includes 397 variables for up to 2434 CARDIA participants in C4RWave 2 questionnaire data includes 448 variables for up to 1901 CARDIA participants in C4RDried Blood Spot/Serosurvey data includes 7 variables for up to 1332 CARDIA participants in C4RDerived data includes 43 variables for up to 2723 CARDIA participants in C4RPhenotype data includes 113 variables for up to 2723 CARDIA participants in C4R
Study
phs003045
National Cancer Institute Genome-Wide Association Study of Renal Cell Carcinoma
The National Cancer Institute (NCI) genome-wide association study (GWAS) of renal cell carcinoma (RCC) was conducted to investigate common genetic variants associated with RCC risk. The GWAS includes 1,453 RCC cases and 3,531 controls of European background from 4 studies (3 cohort, 1 case-control), scanned using the Illumina InfiniumHumanHap 550, 610 and 660W chips. This project was supported by the Intramural Research Program of the National Institutes of Health and NCI. Data from this GWAS were pooled with those from another GWAS of RCC (2,639 cases and 5,392 controls) conducted in Europe by the International Agency for Research on Cancer and the Centre National de Gènotypage. Findings from this collaboration are described in an upcoming report (Purdue et al. Nature Genetics 2011;43(1):60-65; PMID: 21131975). Only data from the NCI scan are included in this dbGaP submission.
Study
phs000351
Gabriella Miller Kids First Pediatric Research Program in Germline and Somatic Variants in Myeloid Malignancies in Children
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource).All of the WGS and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed.This project aims to sequence an unparalleled number of cases of de novo Acute Myeloid Leukemia (AML) and Down Syndrome AML (DS-AML), to establish a database comprised of genomic and transcriptome information which can be interrogated for both somatic and germline variants. Identification of the somatic variants will provide valuable data on the potential genes and pathways that can be targeted for therapeutic purposes. In addition, interrogation of the host’s constitutional genome may yield valuable information about potential germline variants that, in combination with the somatic data, might provide a more informed approach to patient care. For those patients with predisposition mutations, chemotherapy alone might not be adequate for cure, and stem cell transplantation might be required. Also, those who might be at high risk of adverse secondary events (cardiac complications, secondary malignancies, etc.) can be identified early and their therapy can be tailored to minimize anticipated complications. Thus, we propose that the optimum outcome can only be obtained through comprehensive interrogation of the somatic and germline genomes to fully annotate the genomic makeup of the leukemia and its host. Knowing the genomic and transcriptomic makeup of these patients, along with a full complement of clinical characteristics for this cohort, will be critical for making strong correlations which may aid in therapeutic development for future patients. The de novo AML, DS-AML, and Acute Promyelocytic Leukemia (APL) cases were all collected through clinical protocols conducted by the Children’s Oncology Group (COG). In addition to funding from the Gabriella Miller Kids First Pediatric Research Program, the DS-AML cohort was specifically funded by the Lifespan to Understand Down syndrome (INCLUDE) Project.
Study
phs002187
Gabriella Miller Kids First Pediatric Research Program in Whole Genome Sequencing of African and Asian Orofacial Clefts Case-Parent Triads
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). All of the WGS and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed. The focus of this study is to identify novel risk variants for OFC in Africa and Asian OFC case-parent triads through analysis of Whole Genome Sequencing data.
Study
phs001997
Gabriella Miller Kids First Pediatric Research Program in Genetics at the Intersection of Childhood Cancer and Birth Defects
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). All of the WGS and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed. Birth defects and childhood cancer share biological pathways that are important for cell growth and division. We propose that sequencing pediatric patients suffering both conditions will allow us to discover the underlying genes and in turn advance our understanding of the causes of these devastating diseases.
Study
phs001846
Early Family Prevention of Adolescent Alcohol, Drug Use, and Psychopathology
The Early Steps Multisite Study is comprised of researchers from the University of Virginia, the University of Pittsburgh, Arizona State University, and Oregon Research Institute. This longitudinal study has been funded by the National Institute on Drug Abuse at the National Institutes of Health since 2002. The Early Steps Multisite Study conducted a randomized control trial to examine the effects of an intervention program called the Family Check-Up (FCU) offered in early to middle childhood. Outcomes include problem behaviors including substance use. Primary caregivers (PC) and their children (TC) were recruited from Women, Infant, and Children's (WIC) Nutritional Supplement centers in and around Pittsburgh, PA, Eugene, OR and Charlottesville, VA when target participating children were age 2. Participants were screened in three key areas of risk for later child conduct problems: (1) sociodemographic risk (e.g., poverty, teen parent status), (2) family risk (e.g. maternal stress, depressive symptoms), and (3) child conduct problems. Randomization to the intervention condition was balanced on gender to assure an equal number of males and females in the control and intervention groups. Data submitted to dbGaP are from the 515 subjects who were consented to provide a saliva sample.
Study
phs003442
Gabriella Miller Kids First Pediatric Research Program in Craniofacial Microsomia
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource).
All of the WGS and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed.
Craniofacial microsomia (CFM), also termed hemifacial microsomia or oculo-auricular-vertebral spectrum, is the third most common congenital craniofacial condition. CFM comprises a variable phenotype and the most common features include malformations of the ear (i.e. microtia) and lower jaw (i.e. mandibular hypoplasia) on one or both sides. Microtia in the absence of other anomalies is believed to represent the mildest form of CFM. The cause of CFM is unknown for most affected individuals. We have established a cohort through previous studies and collected DNA to identify the genetic contributions to CFM, which could facilitate diagnosis, tailored treatments and guide prevention strategies. Successful completion of this proposal will advance knowledge in the genetic architecture of susceptibility to CFM and will provide insight about the biological mechanisms underlying craniofacial development. The results from this study have the potential to further research on the etiology of other craniofacial conditions, and the pathogenesis of typical and atypical craniofacial development.
Study
phs002130
Aurora US Metastatic Breast Cancer Retrospective Project
The AURORA US Metastatic Breast Cancer project is funded by the Breast Cancer Research Foundation (BCRF) Evelyn H. Lauder Founder's Fund for Metastatic Breast Cancer Research. This multi-center effort conducted within the Translational Breast Cancer Research Consortium (TBCRC) and cancer researchers to better understand the metastatic process through the study of both the primary and metastatic tissue. In the retrospective phase, TBCRC sites submitted matched primary and metastatic tissues and blood from previous collections for piloting the process. Samples were profiled using whole genome DNA sequencing, whole exome DNA sequencing, DNA methylation arrays, and RNA sequencing. The final freeze set for samples with successful nucleic acid extraction and molecular assays included 55 patients with 31 primary tissues and 102 metastases. Twenty patients had tissue collected at autopsy and included 19 with more than one metastasis. Metastases were from 20 different tissue locations with the most common sites being liver, lung, lymph node, and brain. The median age at primary diagnosis was 49 years. The clinical subtypes of the primary tumors were 34% Triple Negative, 30% ER+HER2-, 11% ER+HER2+, and 7% ER-HER2+. Patients received an average of 3 lines of therapy in the metastatic setting. The median overall survival from primary diagnosis was 4.5 years and overall survival from metastatic diagnosis was 2 years.
Study
phs002622
Gabriella Miller Kids First Pediatric Research Program for Infantile Hemangiomas Associated with Multi-Organ Structural Birth Defects
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). All of the WGS and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed. Infantile hemangiomas are the most common benign vascular tumor in infants, affecting 4-5% of children. Thirty percent of segmental infantile hemangiomas on the face and scalp are associated with birth defects of multiple organs. This condition is known as PHACE, an acronym for posterior fossa brain malformations, segmental facial hemangiomas, arterial anomalies, cardiac defects, eye anomalies, and sternal clefting. Genomic analysis of PHACE will inform treatment and expand knowledge about the causes of birth defects affecting the brain, arteries, heart, eye, midline development and hearing. The knowledge gained in this study will be used to drive strategies for prevention and provide critical targets for treatments for a range of birth defects and infantile hemangiomas.
Study
phs001785
Characterizing Advanced Breast Cancer Heterogeneity and Treatment Resistance through Serial Biopsies and Comprehensive Analytics
This dataset accompanies the manuscript "Characterizing Advanced Breast Cancer Heterogeneity and Treatment Resistance Through Serial Biopsies and Comprehensive Analytics". In this study we discuss the clinical implications and consequences of molecular heterogeneity in a metastatic breast cancer patient identified through serial biopsies, multi-omic analysis, and longitudinal patient monitoring, performed through the precision oncology program, Serial Measurements of Molecular and Architectural Responses to Therapy (SMMART). This case study describes a metastatic breast cancer patient with a highly heterogenous tumor that began as ER positive/HER2 negative on diagnosis, then was found to be ER negative/HER2 positive in a metastatic liver lesion (Study biopsy #1), and finally triple negative (TNBC) in another metastatic liver biopsy (Study biopsy #2). Multi-omic assays included clinical genomic panel sequencing, clinical immunohistochemistry, a clinical protein profiling assay (Intracellular Signaling Protein Profiling), RNA sequencing, and RPPA. Here we provide the expression data from RNAseq and RPPA data on Study biopsy #2 for the patient discussed in this case report. Note: The first liver biopsy did not yield sufficient material for RNAseq or RPPA.Raw RNA FASTQ files from Study biopsy #2 of the liver Gene expression (RNAseq) files from Study biopsy #2Protein expression (RPPA) files (level 2, 3, 4) for Study biopsy #2
Study
phs002321
Kids First: Genomic Analysis of Esophageal Atresia and Tracheoesophageal Fistulas and Associated Congenital Anomalies
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource).
All of the WGS and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed.
Esophageal atresia/tracheoesophageal fistula (EA/TEF) is a rare and complex aerodigestive congenital anomaly with an estimated incidence of 1 in 2500 to 1 in 4000 live births. We propose to elucidate the underlying genomic architecture of EA/TEF by performing whole genome sequencing to characterize new clinical syndromes associated with EA/TEF to provide more accurate clinical prognostic information.
Study
phs002161
Kids First: Pediatric Research Project on the Genomic Analysis of Congenital Diaphragmatic Hernia
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). Both, childhood cancers and structural birth defects are critical and costly conditions associated with substantial morbidity and mortality. Elucidating the underlying genetic etiology of these diseases has the potential to profoundly improve preventative measures, diagnostics, and therapeutic interventions. WGS and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed. In collaboration with the University of Utah, DNA from four families were selected for high-depth WGS (60X) including diaphragm and skin tissue to identify mosaicism. In collaboration with the Broad Institute, DNA from four families underwent linked long read sequencing using 10X Genomics technology. Probands with congenital diaphragmatic hernia/defects and both biological parents enrolled as part of the DHREAMS study.
Study
phs001110
Malnutrition and Enteric Disease Network (Mal-ED) Birth Cohort in Brazil
The Malnutrition and Enteric Disease Network (Mal-ED) Birth Cohort in Brazil enrolled 242 children between 2010 and 2014; an additional 101 infants recruited under the ICIDR (International Center for Infectious Disease Research) program and evaluated by the same procedures as Mal-ED were included in the cohort. The aim of the study was to evaluate the role of enteropathogens in causing intestinal inflammation and a failure of growth and development.This submission includes genetic and corresponding anthropometric data on a subset of 279 participants. More information on this project and its related projects can be found here: Studies of diarrheal susceptibility, growth, and development, in pediatric populations in Northeastern Brazil
Study
phs003172
Modelling Multi-Dimensional ClinOmics for Precision Therapy of Children and Adolescent Young Adults with Relapsed and Refractory Cancer: A Report from the Center for Cancer Research
The Center for Cancer Research (CCR), of the intramural NCI undertook a multidimensional clinical genomics study of children and adolescent young adults with relapsed and refractory cancers who were enrolled on other therapeutic trials to determine the feasibility of a genome guided precision therapy protocol in these patients.Note: NCIPM001blood - This sample was previously submitted with study phs002207, with sample ID NCIPM001blood_E and SRR ID SRR18544311.
Study
phs001052
Women's Environmental Cancer and Radiation Epidemiology (WECARE) Study
Study population The WECARE Study is an international multicenter population-based case-control study of cases with asynchronous contralateral breast cancer (CBC) and individually matched controls with unilateral breast cancer (UBC). Recruitment and data collection for the WECARE Study were conducted in two phases, herein referred to as WECARE I (2001–2004) and WECARE II (2009–2012). The study design of the first phase (WECARE I) has been described in detail elsewhere [Bernstein JL, et al 2004, PMID: 15084244]; the second phase (WECARE II) employed a nearly identical approach [Langballe R, et al 2016, PMID: 27400983]. Participants in each phase were identified through eight population-based cancer registries, including six in the United States that contribute to the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) program: Los Angeles County Cancer Surveillance Program; Cancer Surveillance System of the Fred Hutchinson Cancer Center (Seattle region, WA); State Health Registry of Iowa; Cancer Surveillance Program of Orange County/San Diego-Imperial Organization for Cancer Control (Orange County/San Diego, CA); the Greater Bay Area Cancer Registry (San Francisco Bay Area region and Santa Clara region, CA); and the Sacramento and Sierra Center Registry (Sacramento region, CA). Participants were additionally identified using the Ontario Cancer Registry (Canada) and the Danish Breast Cancer Cooperative Group Registry, supplemented by data from the Danish Cancer Registry. The study protocol was approved by the institutional review or regulatory ethics boards at each study site. Data collection Study participants were interviewed by telephone using a structured questionnaire aimed at evaluating known or suspected breast cancer risk factors, including personal demographics, medical history, menstrual and reproductive history, family history of cancer, use of hormones, smoking, and alcohol intake. Risk factor status was assessed during the period prior to first diagnosis, as well as between first diagnosis and reference date (i.e., the at-risk period for CBC). Detailed data on treatment and tumor characteristics were obtained directly from cancer registry records or by abstracting medical records, including pathology and surgical reports, radiation oncology clinic notes, and systemic adjuvant treatment data. Complete medical treatment history information was collected, and for all women who received radiotherapy, the radiation dose to the contralateral breast was reconstructed using radiotherapy records and radiation measurements. Biospecimens were collected from study participants for genotyping: blood collected from WECARE I women; and saliva collected from WECARE II women.
Study
phs003945
Gabriella Miller Kids First Pediatric Research Program: An Integrated Clinical and Genomic Analysis of Treatment Failure in Pediatric Osteosarcoma
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in childhood cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). Both childhood cancers and structural birth defects are critical and costly conditions associated with substantial morbidity and mortality. Elucidating the underlying genetic etiology of these diseases has the potential to profoundly improve preventative measures, diagnostics, and therapeutic interventions. All of the WGS and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed. Although the survival of children with relapsed osteosarcoma is very poor, little is known about the etiology of treatment failure in this disease. The purpose of this project is to perform whole genome sequencing on serial samples from patients with osteosarcoma obtained before treatment, after treatment, and at relapse/metastasis in order to identify the mutations and pathways that are drivers of drug resistance. If successful, our results may help identify patients at high risk for treatment failure and may yield new treatments for children who cannot currently be cured.
Study
phs001714
NHLBI TOPMed: Coronary Artery Risk Development in Young Adults (CARDIA)
CARDIA is a study examining the etiology and natural history of cardiovascular disease beginning in young adulthood. In 1985-1986, a cohort of 5115 healthy black and white men and women, aged 18-30 years, were selected to have approximately the same number of people in subgroups of age (18-24 and 25-30), sex, race, and education (high school or less, and more than high school) within each of four US Field Centers. These same participants were asked to participate in follow-up examinations during 1987-1988 (Year 2), 1990-1991 (Year 5), 1992-1993 (Year 7), 1995-1996 (Year 10), 2000-2001 (Year 15), 2005-2006 (Year 20), 2010-2011 (Year 25) and 2015-2016 (Year 30). In addition to the follow-up examinations, participants are contacted regularly for the ascertainment of information on out-patient procedures and hospitalizations experienced between contacts. Within the past five years, 95% of the original surviving cohort has been contacted. While the specifics of each examination have differed somewhat, data have been collected on a variety of factors believed to be related to heart disease. These include conditions with clear links to heart disease, such as blood pressure, cholesterol and other lipids. Data have also been collected on physical measurements, such as weight and skinfold fat, as well as lifestyle factors such as substance use (tobacco and alcohol), dietary and exercise patterns, behavioral and psychological variables, medical and family history, and other chemistries (e.g., insulin and glucose). In addition, subclinical atherosclerosis was measured via echocardiography during Years 5, 10, and 25, computed tomography during Years 15 and 20, and carotid ultrasound during Year 20.Comprehensive phenotypic data for study participants are available through dbGaP phs000285.
Study
phs001612
Comparison Between qPCR and RNA-Seq Reveals Challenges of Quantifying HLA Expression
Human leukocyte antigen (HLA) class I and HLA class II loci are essential elements of innate and acquired immunity. Their exceptional influence on disease outcome is well documented by GWAS and candidate gene studies. The impact of HLA allelic variation on human disease through allele-specific presentation of antigenic peptides to T cells has been the main focus to determine HLA effects on disease susceptibility/pathogenesis. However, HLA expression levels have also been implicated in disease, adding another dimension to the extreme diversity of HLA that impacts variability in immune responses across individuals. HLA expression levels routinely rely on quantitative PCR (qPCR). An alternative is adoption of high throughput technologies such as RNA-Seq. This provides the opportunity to quantify HLA expression in large datasets, but also allows comparison between RNA-Seq and qPCR. In this study, we analyze expression data for HLA Class I genes for a matched set of individuals (N = 96) by RNA-Seq, qPCR and cell surface expression. Samples were obtained from the Research Donor Program (NCI-Frederick). We observed a moderate correlation between RNA-Seq and qPCR. We suggest technical and biological factors that might contribute to differences in the techniques.
Study
phs003177
Initial whole genome sequencing of plasma cell neoplasms in First Responders exposed to the World Trade Center attack of September 11, 2001
The World Trade Center (WTC) attack of September 11, 2001 created an unprecedented environmental exposure to known and suspected carcinogens. High incidence of multiple myeloma (MM) and precursor conditions has been reported among first responders to the WTC disaster. To expand on our prior screening studies, and to characterize the genomic impact of the exposure to known and potential carcinogens in the WTC debris, we were motivated to perform whole genome sequencing (WGS) of WTC first responders and recovery workers who developed a plasma cell disorder after the attack. PATIENTS AND METHODS: We performed WGS of 9 CD138-positive bone marrow mononuclear samples from patients who were diagnosed with plasma cell disorders after the WTC disaster. RESULTS: No significant differences were observed in comparing the post-WTC driver and mutational signatures landscape with 110 previously published WGS from 56 patients with MM and the CoMMpass WGS cohort (n=752). Leveraging constant activity of the single base substitution mutational signatures 1 and 5 over time, we estimated that tumor-initiating chromosomal gains were windowed to both pre- and post-WTC exposure. CONCLUSIONS: Although limitations in sample size preclude any definitive conclusions, our findings suggest that the observed increased incidence of plasma cell neoplasms in this population is due to complex and heterogeneous effects of the WTC exposure that may have initiated or contributed to progression of malignancy.
Study
EGAS00001004467
GMKF: Kids First Pediatric Research Program on Congenital Cranial Dysinnervation Disorders and Related Birth Defects
The Gabriella Miller Kids First Pediatric Research Program (Gabriella Miller Kids First Pediatric Research Program) (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). Both, childhood cancers and structural birth defects are critical and costly conditions associated with substantial morbidity and mortality. Elucidating the underlying genetic etiology of these diseases has the potential to profoundly improve preventative measures, diagnostics, and therapeutic interventions. Whole Genome Sequence (WGS) and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed. Goals of this ongoing study are to identify novel "congenital cranial dysinnervation disorder" (CCDD) genes and define the role of the wildtype and mutant genes in normal and aberrant development. The umbrella term (CCDD) refers to congenital birth defects with malformation of one or more cranial nerves, typically resulting in limitations of eye and/or face movement. Examples of CCDDs include congenital fibrosis of the extraocular muscles (CFEOM), congenital ptosis, Duane retraction syndrome (DRS), horizontal gaze palsy with progressive scoliosis (HGPPS), congenital 3rd, 4th or 6th nerve palsies, Moebius syndrome (MBS), and hereditary congenital facial paresis (HCFP). In some cases, anosmia, and disorders of hearing, sucking, chewing, swallowing, and breathing may also be classified as CCDDs. CCDDs can be accompanied by additional birth defects such as intellectual and social disabilities, developmental delays, limb anomalies, and cardiac, GI, and GU disorders. The genetic basis of multiple CCDDs has been determined, and the gene mutations typically alter cranial motor neuron identity or function, or perturb axon growth and guidance. Despite these successes, the genetic etiologies of many inherited CCDDs remain unidentified.
Study
phs001247
NHLBI Family Heart Study (FamHS-Visit1 and FamHS-Visit2)
The Family Heart Study (FamHS) was funded by the National Heart, Lung, and Blood Institute (NHLBI). It was begun in 1992 with the ascertainment of 1,200 families, half randomly sampled, and half selected because of an excess of coronary heart disease (CHD) or risk factor abnormalities as compared with age- and sex-specific population rates (Higgins et al. 1996). The families, with approximately 6,000 individuals, were sampled on the basis of information on probands from four population-based parent studies: the Framingham Heart Study, the Utah Family Tree Study, and two Atherosclerosis Risk in Communities (ARIC) centers (Minneapolis, and Forsyth County, NC). A broad range of phenotypes were assessed at a clinic examination in broad domains of CHD, atherosclerosis, cardiac and vascular function, inflammation and hemostasis, lipids and lipoproteins, blood pressure, diabetes and insulin resistance, pulmonary function, and anthropometry (FamHS Visit 1). Approximately 8 years later, study participants belonging to the largest pedigrees were invited for a second clinical exam (FamHS Visit 2). A total of 2,756 Caucasian subjects in 508 extended families were examined. A two-phase design was adopted for the genome wide association (GWA) study. In phase-1, 1007 subjects were chosen, equally distributed between the upper and lower quartile of age- and sex-adjusted values for coronary artery calcification, assessed by CT scan in Visit 2. These subjects were chosen to be largely unrelated; 34% of the subjects were from unique families, while 200 other subjects had 1 or more siblings selected into the sample, yielding a sample of 465 unrelated subjects. The remaining family members (N=1749) were genotyped in the phase-2 for replication of the top hits from the phase-1. The results presented here represent those for the analysis of the phase-1 case-control sample for variables assessed in FamHS Visit 1 (from 1992 to 1995) and for the variables assessed in FamHS Visit 2 (from 2002 to 2003). All subjects were typed on the Illumina HumMap 550 chip (Phase 1 genotype). Of these, 33 (3.3%) were excluded due to technical errors, call rates below 98%, and discrepancies between reported sex and sex-diagnostic markers. The final sample of 974 subjects have Visit 2 phenotypes, approximately 100 of these do not have Visit 1 phenotypes. There was no significant plate-to-plate variation in allele frequencies. The covariate adjustments were performed separately by sex using cubic polynomial age and clinical centers, and retaining the terms in the stepwise regression analysis that were significant at the 5% level. Extreme outliers (>4 SD from the mean) were set aside, temporarily, for the adjustments. The final phenotypes were computed for all individuals using the best mean regression models and standardizing to 0 mean and unit variance. The FamHS has contributed GWA results in many phenotype domains (antropometric and adiposity, atherosclerosis and coronary heart disease, lipid profile, diabetes and glicemic traits, metabolic syndrome etc) to meta-analyses and various consortia, including Heard-Costa et al. 2009, Köttgen et al. 2010, Teslovich et al. 2010, Nettleton et al. 2010, Lango et al. 2010, Heid et al. 2010, Speliotes et al. 2010, Dupuis et al. 2010, Kraja et al. 2011.
Study
phs000221
Kids First: Whole Genome Sequencing of Nonsyndromic Craniosynostosis
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). All of the WGS and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed. Nonsyndromic craniosynostosis (NCS) is a common, major structural birth defect - due to the premature fusion of one or more cranial sutures - that requires extensive surgical correction and is associated with considerable ongoing medical problems and health care costs. Because little is known about the causes of NCS, whole genome sequencing will help advance knowledge of genetic factors contributing to the etiology of NCS. Data from this project will lead to a better understanding of biological processes involved in the etiology of NCS and provide critical insights for development of early diagnostic tools and therapeutic strategies.
Study
phs001806
OncoArray: Follow-up of Ovarian Cancer Genetic Association and Interaction Studies (FOCI)
The Follow-up of Ovarian Cancer Genetic Association and Interaction Studies (FOCI) was one of five projects funded in 2010 as part of the NCI's Genetic Associations and Mechanisms in Oncology (GAME-ON) initiative (http://epi.grants.cancer.gov/gameon/). FOCI represents a collective effort that builds upon the strengths and history of collaboration inherent in the Ovarian Cancer Association Consortium (OCAC), a multidisciplinary group comprised of epidemiologists, genetic epidemiologists, statistical geneticists, molecular and cell biologists and clinicians that was formed in 2005. The other four funded GAME-ON projects were: the ColoRectal TransdisciplinaryStudy (CORECT), Elucidating Loci Involved in Prostate Cancer Susceptibility (ELLIPSE), Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE), and Transdisciplinary Research in Cancer of the Lung (TRICL). As part of our aim to discover, expand, and replicate ovarian cancer susceptibility loci, the GAME-ON projects and other consortia formed the OncoArray network (http://epi.grants.cancer.gov/oncoarray/) to develop and genotype a new custom genotyping array in large numbers of cancer cases and controls (over 400,000 samples) across multiple cancer types. The FOCI data includes over 50,000 ovarian cancer cases and controls genotyped with the Oncoarray at the Center for Inherited Disease Research (CIDR). Genotype calling and quality control procedures were performed under a standardized protocol across the Oncoarray consortium, and over 490,000 SNPs passed QC and are included under this dbGaP submission.
Study
phs001882
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Severe Asthma Research Program (SARP)
Cohort Description The Severe Asthma Research Program (SARP) has been investigating the clinical, physiologic and molecular phenotypes of asthma since 2000. It is currently following ~400 deeply phenotyped asthma patients. Data Being Submitted Wave 1 questionnaire data includes 397 variables for up to 375 SARP participants in C4R.Wave 2 questionnaire data includes 448 variables for up to 289 SARP participants in C4R.Dried Blood Spot/Serosurvey data includes 7 variables for up to 290 SARP participants in C4R.Derived data includes 43 variables for up to 463 SARP participants in C4R.Phenotype data includes 113 variables for up to 463 SARP participants in C4R.
Study
phs002913
DCLRE1B/Apollo germline mutations associated with renal cell carcinoma
Highlights of Bories et al. article submitted to BBA - Molecular Basis of Disease: We identified 2 germline mutations in the DCLRE1B gene encoding the Apollo protein by Whole Exome Sequencing (WES) in two families with inherited clear-cell Renal Cell Carcinoma. Our study suggests a putative link between DNA repair, telomere protection and renal oncogenesis.
Study
EGAS50000000216
STAMPEED: Northern Finland Birth Cohort 1966 (NFBC1966)
The Northern Finland Birth Cohorts program (NFBC) was initiated in the 1960s in the two northernmost provinces of Finland to study risk factors involved in pre-term birth and intrauterine growth retardation, and the consequences of these early adverse events on subsequent morbidity and mortality. The uniqueness of NBFCs is that the data of the cohorts were obtained from early fetal life (including maternal health during pregnancy) to adulthood. The NFBC1966 includes 12,058 live births to mothers in the two northern-most provinces of Finland. Two decades later, a second cohort of 9432 births was obtained (NFBC1986). In NFBC1966 pregnancies were followed prospectively from the first antenatal contact (10-16th week). After birth, the offspring were examined and then again underwent clinical evaluation at ages 1y, 7y, 14-16y and 31y. At each visit, a wide range of phenotypic, lifestyle and demographic data were gathered by questionnaires and clinical examinations. For the most part, NFBC1986 has undergone similar evaluations to NFBC1966. Linkage to national registries includes hospitalization, deaths, education, medication, pensions, and provides up-to-date demographic and clinical information for members of both cohorts. DNA samples were obtained from 5,923 subjects from NFBC1966 and 6688 subjects from NFBC1986. Data coverage, 96% of all births in 1966 and 99% in 1986, is highly representative for the whole population. The NFBC program comprises more than 20 different projects coordinated by the Center of Lifecourse Disease studies in Northern Finland (COLD) at Oulu University. The prospective data collected from the NFBCs form a unique resource, allowing the study of disease emergence, and of the importance of genetic, biological, social and behavioral risk factors. The genome-wide association (GWA) study sponsored through the STAMPEED program of NHLBI employed genomic DNA samples previously collected by the NFBC1966 study and stored in the DNA repository of the National Institute for Health and Welfare, Finland. This NHLBI sponsored RO1 project aimed to identify genetic variants contributing to metabolic and cardiovascular diseases (CVD). In addition to de-identified genome wide genotypic data, a selected list of phenotypic data related to CVD including weight, height, BMI, HDL, LDL, total cholesterol, triglyceride, glucose, insulin and fasting status, are also available in dbGaP. A summary of the GWAS for the NFBC1966 cardiovascular risk traits can be found in Sabatti et al., Nature Genetics 41: 35-46, 2009, PMID: 19060910. The version 2 release of this study contains sequence data from seventeen loci associated with levels of triglyceride, HDL-C, LDL-C, total cholesterol, fasting plasma glucose, and fasting plasma insulin (Kathiresan et al. 2008, Willer et al. 2008, Sabatti et al. 2009, Dupuis et al. 2010, Teslovich et al. 2010). At each locus, protein-coding regions and 5' and 3' untranslated regions of genes nearest to single nucleotide polymorphisms showing genome-wide significant association with metabolic syndrome-related traits, were sequenced. Targeted Illumina sequencing of 78 genes (~270kb) using 150bp probes was performed on 4943 subjects of the Northern Finland Birth Cohort 1966 (NFBC1966). Whole exome sequencing on the Illumina platform was carried out on 586 of those participants. The sequencing study is part of a larger project that is funded by the National Human Genome Research Institute's Allelic Spectrum in Common Disease Initiative, and comprises sequence data from more than 7000 individuals in two Finnish cohorts: NFBC1966 and the Finland-United States Investigation of NIDDM Genetics (FUSION) study.
Study
phs000276
Baylor College of Medicine Advancing Sequencing in Childhood Cancer Care (BASIC3) Clinical Exome Sequencing Study - Clinical Sequencing Exploratory Research Consortium
The BASIC3 (Baylor College of Medicine Advancing Sequencing in Childhood Cancer Care) study is a National Genome Human Research Institute (NHGRI) and National Cancer Institute-funded Clinical Sequencing Exploratory Research (CSER) consortium project that focuses on prospective implementation of clinical whole exome sequencing in the pediatric oncology clinic. The primary study objective are to integrate information from CLIA-certified germline and tumor exome sequencing into the care of newly diagnosed solid tumor patients at Texas Children's Cancer Center, and to perform parallel evaluation of the impact of tumor and germline exomes on families and physicians. Blood and frozen tumor (if available) samples are collected from children undergoing surgery or biopsy of newly diagnosed solid tumors and subjected to exome sequencing in a CLIA-certified laboratory. Germline and tumor (if applicable) exome sequencing reports are generated and submitted into the electronic health record and returned to each patient/family by their primary oncologist. If specific (optional) consent is obtained, research sequencing studies (including transcriptome sequencing) are performed in addition to the clinical exome sequencing.
Study
phs001026
Genetic Epidemiology Network of Arteriopathy (GENOA)
The Genetic Epidemiology Network of Arteriopathy (GENOA): GENOA is one of four research networks that form the NHLBI Family Blood Pressure Program (FBPP). From its inception in 1995, GENOA's long-term objective was to elucidate the genetics of hypertension and its arteriosclerotic target-organ damage, including both atherosclerotic (macrovascular) and arteriolosclerotic (microvascular) complications involving the heart, brain, kidneys, and peripheral arteries. Two GENOA cohorts were originally ascertained (1995-2000) through sibships in which at least 2 siblings had essential hypertension diagnosed prior to age 60 years. All siblings in the sibship were invited to participate, both normotensive and hypertensive. These include non-Hispanic White Americans from Rochester, MN (n =1583 at the 1st exam) and African Americans from Jackson, MS (N=1854 at the 1st exam). During the second exam (2000-2005), approximately 80% of participants were re-recruited. The GENOA data consists of biological samples (DNA, serum, urine) as well as demographic, anthropometric, environmental, clinical, biochemical, physiological, and genetic data for understanding the genetic predictors of diseases of the heart, brain, kidney, and peripheral arteries. Family Blood Pressure Program (FBPP): GENOA's parent program, the FBPP, is an unprecedented collaboration to identify genes influencing blood pressure (BP) levels, hypertension, and its target-organ damage. This program has conducted over 21,000 physical examinations, assembled a shared database of several hundred BP and hypertension-related phenotypic measurements, completed genome-wide linkage analyses for BP, hypertension, and hypertension associated risk factors and complications, and published over 130 manuscripts on program findings. The FBPP emerged from what was initially funded as four independent networks of investigators (HyperGEN, GenNet, SAPPHIRe and GENOA) competing to identify genetic determinants of hypertension in multiple ethnic groups. Realizing the greater likelihood of success through collaboration, the investigators began working together during the first funding cycle (1995-2000) and formalized this arrangement in the second cycle (2000-2005), creating a single confederation with program-wide and network-specific goals.
Study
phs000379
Kids First: Genetics of Pediatric Germ Cell Tumors
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). All of the genomic and phenotypic data from this study are accessible through dbGaP. The data is also available at the Kids First Portal, where other Kids First datasets can also be accessed in the cloud for data analysis, data visualization, collaboration and interoperability, open to all researchers and developers.Pediatric malignant germ cell tumors (GCTs) represent approximately 6% of childhood cancers, including 3% of tumors in children aged 0-14 years and 15% of tumors in adolescents. GCTs are heterogeneous and grouped together due to the presumed common cell of origin, the primordial germ cell (PGC). GCTs typically occur in the testes or ovaries; however, extragonadal GCTs can occur and are likely a result of abnormal germ cell migration during development. Evidence suggests that GCTs, including those diagnosed in adults, are initiated in utero. Thus, alterations in normal embryonic development are likely to be especially relevant to GCT etiology. Germline susceptibility has not been evaluated in an agnostic fashion in pediatric GCT, mainly due to a lack of an adequate number of samples; however, emerging evidence supports a genetic etiology.In the two Kids First GCT projects, we will evaluate genetic susceptibility to intracranial and extracranial GCT by sequencing probands and their unaffected parents. The goals of the project are to: 1) evaluate the contribution of rare genetic variants in GCT through the use of aggregate burden tests, focusing on genes and established regulatory regions; 2) identify de novo SNVs and CNVs in pediatric GCT using a case-parent triad design; and 3) identify molecular signatures in GCT tumor specimens, overall and by age group and tumor characteristics. Whole Genome Sequencing (WGS) data generated through the Gabriella Miller Kids First Pediatric Research Program will provide an opportunity to investigate the genetic origins of GCT in a diverse set of samples. Given the limited knowledge of GCT etiology and biology, the results of the proposed analyses are likely to have a big impact on the field.
Study
phs002322
Genome Wide Association for Asthma and Lung Function
The SNP Typing for Association with Multiple Phenotypes from Existing Epidemiologic (STAMPEED) asthma project includes subjects with asthma and controls from the Chicago Asthma Genetics Study (CAG), NHLBI multicenter Severe Asthma Research Program (SARP) and NHLBI Collaborative Studies on the Genetics of Asthma CSGA (Wake Forest). All studies included European American and African American children and adults with asthma ranging from mild to severe and adult controls. CAG participants were collected at the University of Chicago. SARP participants were recruited at the NHLBI SARP sites with an emphasis on recruiting severe asthmatics (Moore et al, Am J Respir Crit Care Med, 2010, PMID: 19892860). CSGA cases and controls collected by the Wake Forest investigators were also included. Asthma status was based on both a physician's diagnosis and either bronchodilator reversibility or hyper-responsiveness to methacholine as well as less than 5 pack years of smoking. Genotyping was performed on the Illumina 1Mv1 platform, with individual genotypes called using clustering algorithms as implemented in the BeadStudio software by Illumina. The total number of markers following standard QC was 1,025,129. Imputation was performed using the HapMap phase 2, release 21 SNPs using MACH with the phased HapMap CEU and YRI haplotypes as a reference. Case/control association tests for asthma status were performed using logistic regression in R (http://CRAN.R-project.org/) on genotype dosages, and adjusting for the first principal component from EIGENSTRAT.
Study
phs000355
Glial Cell Line-Derived Neurotrophic Factor (GDNF) Polymorphisms and Anxiety, Depression
GDNF gene variants were studied as possible risk factors of depression or anxiety on a young sample. The association study involved eight (rs1981844, rs3812047, rs3096140, rs2973041, rs2910702, rs1549250, rs2973050 and rs11111) GDNF single nucleotide polymorphisms and anxiety and depression scores measured by the Hospital Anxiety and Depression Scale (HADS) on 708 Caucasian young adults with no psychiatric history. Results provided significant effects of two single nucleotide polymorphisms on anxiety scores following the Bonferroni correction for multiple testing (p=0.00070 and p=0.00138 for rs3812047 and rs3096140, respectively). Haplotype analysis confirmed the role of these SNPs (p=0.00029). A significant sex-gene interaction was also observed since the effect of the rs3812047 A allele as a risk factor of anxiety was more pronounced in males. This is the first demonstration of a significant association between the GDNF gene and mood characteristics demonstrated by the association of two SNPs of the GDNF gene (rs3812047 and rs3096140) and individual variability of anxiety using self-report data from a non-clinical sample. Reprinted from Kotyuk et. al., 2013 (Kotyuk, E., Keszler, G., Nemeth, N., Ronai, Z., Sasvari-Szekely, M., and Szekely, A. (2013). Glial Cell Line-Derived Neurotrophic Factor (GDNF) as a Novel Candidate Gene of Anxiety. PLoS One,8, (12) PMID: 24324616), with permission from Publisher (All content of articles published in PLOS journals is open access. You can read about our open access license here: http://www.plos.org/about/open-access/. To summarize, this license allows you to download, reuse, reprint, modify, distribute, and/or copy articles or images in PLOS journals, so long as the original creators are credited (e.g., including the article's citation and/or the image credit); Laura Perry; Staff EO;PLOS ONE)
Study
phs000713
Exome sequencing of samples taken at multipl time points to monitor therapy response in AML
As part of our individualized systems medicine program, personalized treatment options are identified and administered to chemorefractory AML patients based on exome sequencing and ex-vivo drug sensitivity and resistance testing data. In this study, we developed this approach further to take into account clonal heterogeneity and analyzed responses of 13 AML patients to chemotherapy and six patients to targeted treatments by using exome and ultra-deep amplicon resequencing. The data submitted includes fastq files from exome sequencing of multiple time-point samples collected from the 13 patients. In this study we identified rare subclonal variants present at diagnosis or later relapse samples and developed a correlation-based method to compare subclonal responses in serial samples across multiple time-points. Significant subclone-specific responses were observed subsequent to chemotherapy and specifically to targeted therapy in five patients.
Study
EGAS00001001948
Gabriella Miller Kids First Pediatric Research Project in Microtia in Hispanic Populations
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource).
All of the WGS and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed.
Microtia is a rare congenital deformity of the external ear, the pinna. The severity of microtia is variable and ranges from subtle deformities in the pinna to absence of the external ear. Microtia is often associated with closure of the external auditory ear canal causing significant hearing loss. Microtia can be an isolated, unilateral or bilateral malformation, or occur solely with ear canal deformities, or with additional craniofacial or syndromic manifestations. Earlier studies of identical twins with microtia demonstrated a significant genetic contribution. The molecular pathogenesis for most microtia remains unknown. We propose to leverage our clinical acumen in diagnosis and treatment of microtia (R.E.), our relationship to the microtia community (M.T.) and our collected DNA samples from microtia patients to identify genetic variant(s) that contribute to this congenital malformation. Microtia prevalence is much higher among Native Americans and some Latin Americans (17 per 10000 Ecuadorian births) than among individuals of European-descent (0.6 -1.6 per 10,000 births). To capitalize on this epidemiologic data, we have recruited microtia cohorts from Latin America and the U.S, including clinical data and DNA samples.
Study
phs002172
National Eye Institute (NEI) Age-Related Eye Disease Study (AREDS)
The Age-Related Eye Disease Study (AREDS) was initially designed as a long-term multi-center, prospective study of the clinical course of age-related macular degeneration (AMD) and age-related cataract. In addition to collecting natural history data, AREDS included a clinical trial of high-dose vitamin and mineral supplements for AMD and a clinical trial of high-dose vitamin supplements for cataract. AREDS participants were 55 to 80 years of age at enrollment and had to be free of any illness or condition that would make long-term follow-up or compliance with study medications unlikely or difficult. On the basis of fundus photographs graded by a central reading center, best-corrected visual acuity and ophthalmologic evaluations, 4,757 participants were enrolled in one of several AMD categories, including persons with no AMD. The clinical trials for AMD and cataract were conducted concurrently. AREDS participants were followed on the clinical trials for a median time of 6.5 years. Subsequent to the conclusion of the clinical trials, participants were followed for an additional 5 years and natural history data were collected. The AREDS research design is detailed in AREDS Report 1. AREDS Report 8 contains the mainline results from the AMD trial; AREDS Report 9 contains the results of the cataract trial. Blood samples were also collected from 3,700+ AREDS participants for genetic research. Genetic samples from 600 AREDS participants (200 controls, 200 Neovascular AMD cases, and 200 Geographic Atrophy cases) were selected using data available in March 2005 and then were evaluated with a genome-wide scan. These data, as well as selected phenotypic data, were made available in the dbGaP. DNA from AREDS participants, which is currently being stored in the AREDS Genetic Repository, is available for research purposes. However, not all of the 3,700+ AREDS participants who submitted a blood sample currently have DNA available. In addition to including the data from the genome-wide scan on the 600 original samples, this second version of the AREDS dbGaP database provides a comprehensive set of data tables with extensive clinical information collected for the 4,757 participants who participated in AREDS. The tables include information collected at enrollment/baseline, during study follow-up, fundus and lens pathology, nutritional estimates, quality of life measures and measures of morbidity and mortality. In November 2010, over 72,000 high quality fundus and lens photographs of 595 AREDS participants (of the original 600 selected for the genome-wide scan) were made available in the AREDS dbGaP. In addition to the genome-wide scan data, the fundus and lens grading data for these participants are also available through the AREDS dbGaP. Details about the ocular photographs that are available may be found in the document "Age-Related Eye Disease Study (AREDS) Ocular Photographs". In January 2012, a measure of daily sunlight exposure was added in a separate "sunlight" table. Furthermore, the "followup" table has been revised. The visual acuity for the right eye was inadvertently missing at odd-numbered visits (01, 03, 05, etc.). This data is now part of the table. In February 2014 over 134,500 high-quality fundus photographs (macular field F2) of 4613 AREDS participants were added to the existing AREDS dbGaP resource. The AREDS dbGaP image archive already contains over 72,000 high quality fundus and lens photographs of 595 AREDS participants for whom dbGaP-accessible genotype data exist. Information about the available ocular photographs found in the document "Age-Related Eye Disease Study (AREDS) Ocular Photographs" has been updated with an addendum. It is hoped that this resource will better help researchers understand two important diseases that affect an aging population. These data may be applied to examination and inference on genetic and genetic-environmental bases for age-related diseases of public health significance and may also help elucidate the clinical course of both conditions, generate hypotheses, and aid in the design of clinical trials of preventive interventions. Definitions of Final AMD Phenotype Categories Please see phd001138.1 for a detailed description of how AREDS participants' final AMD phenotype was categorized. User's Guide for AREDS Phenotype Data A detailed User's Guide for the AREDS phenotype data is available. This User's Guide is meant to be a comprehensive document which explains the complexities of the AREDS data. It is recommended that all researchers using AREDS phenotype data make use of this User's Guide.
Study
phs000001
CATHeterization GENetics (CATHGEN)
The CATHGEN biorepository consists of biological samples collected on 9334 sequential consenting individuals undergoing cardiac catheterization at Duke University Medical Center between 2001 and 2010 inclusive. The Institutional Review Board informed consent allowed for 50 mL of blood to be collected from fasting patients through the femoral arterial sheath during the catheterization procedure. Three 7.5 mL EDTA tubes for DNA extraction are stored at -80°C. The Duke Database for Cardiovascular Disease (DDCD) provides the bulk of the clinical data used for analysis. Follow-up includes mortality information gleaned from the National Death Index and Social Security Death Index plus follow-up phone calls and written questionnaires regarding MI, stroke, re-hospitalization, coronary re-vascularization procedures, smoking, exercise, and medication use.
Study
phs000703
Discovering the Genetic Basis of Human Neuroblastoma: A Gabriella Miller Kids First Pediatric Research Program (Kids First) Project
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). All of the WGS and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed. Children with disseminated neuroblastoma have a very high risk of treatment failure and death despite receiving intensified chemotherapy, radiation therapy and immunotherapy. The long-term goal of our research program is to ultimately improve neuroblastoma cure rates by first comprehensively defining the genetic basis of the disease. The central hypothesis to be tested here is that neuroblastoma arises largely due to the epistatic interaction of common and rare heritable DNA variation. Here we will perform a comprehensive whole genome sequencing of 563 quartets of neuroblastoma patient germline and diagnostic tumor DNAs and germline DNAs from both parents. The case series was recently collected through a Children's Oncology Group epidemiology clinical trial and is robustly annotated with complete demographic (age, sex, race, ethnicity), clinical (e.g. age at diagnosis, stage, risk group), epidemiologic (parental dietary and exposure questionnaire) and biological (e.g. tumor MYCN status and multiple other tumor genomic measures) co-variates. Subjects were consented for genetic research and DNA is immediately available for shipment for sequencing. We propose Illumina-based whole genome sequencing in the 593 "trio" germline samples (Aim 1; due to missing parent: 487 full neuroblastoma triads, 106 child-single parent dyads = 1673 whole genome sequences) and matched diagnostic tumor DNA (Aim 2; N=366) at 30x sequencing depth (N=2039 whole genome sequences). Also in Aim 2 we will perform whole exome (100x) and RNA sequencing on the 366 tumor DNA and 228 tumor RNA samples from this cohort. Finally, we propose a pilot study of structural variation using long-range sequencing in 10 non-overlapping tumor samples chosen based on potentially relevant chromosomal alterations discovered with conventional NGS. Thus, a total of 2277 individual samples and 2655 sequences will be generated. We will use our established analytic pipeline that is currently being used to study the germline genomes of all cases sequenced through the NCI supported Therapeutically Applicable Research to Generate Effective Treatments program. We plan a three-stage analytic approach, first focusing on classic de novo and inherited Mendelian damaging alterations. We will next integrate our extensive epigenomic data from human neuroblastoma cell lines and genome-wide association study data (N=5,703 neuroblastoma cases to date) to guide a comprehensive assessment of noncoding variants that influence tumor initiation with a recently established analytic pipeline. Finally, we will utilize the tumor DNA analyses to inform relevance via somatic gain or loss of function effects at the sequence and/or copy number levels. All data generated in this project will be immediately placed into the Genomic Data Commons (GDC) and we will compute within this environment by importing our analytic pipelines into the GDC. These data will be fully integrated into the Kids First Data Resource and freely shared with all academically qualified petitioners. This comprehensive data set derived from a large and richly phenotyped series of neuroblastoma DNA quartets will be integrated with existing germline and/or tumor genomic data from over 6,000 neuroblastoma subjects (but none with matched patient-parent germline sequencing data) to provide an unparalleled opportunity to comprehensively discover the genetic basis of neuroblastoma.
Study
phs001436
Single Cell Analysis Program - Transcriptome (SCAP-T)
This initiative is part of the Single Cell Analysis Program (SCAP) and is funded through the NIH Common Fund (See http://nihroadmap.nih.gov/), which supports cross-cutting programs that are expected to have exceptionally high impact. Common Fund initiatives address key roadblocks in biomedical research that impede basic scientific discovery and its translation into improved human health. In addition, these programs capitalize on emerging opportunities to catalyze progress across multiple biomedical fields. Single cell analysis has recently emerged as an important field of research because technologies have improved in sensitivity and throughput sufficiently to begin measuring and understanding heterogeneity in complex biological systems and correlating it with changes in biological function and disease processes. By profiling individual cells it is possible to resolve rare cells, transient cell states, and the influence of organization and environment on such cells and states, which cannot be described by ensemble measurements. The long-term goal of the SCAP is to accelerate this move towards personalizing health to the cellular level by understanding the link between cell heterogeneity, tissue function and emergence of disease through the discovery, development and translation of innovative approaches which will dramatically change the way cells are characterized. The SCAP will focus on research, which will systematically measure, analyze and model cell-to-cell variation, and identify crucial differences and rare biological states, which may have important functional consequences. Under SCAP, there are three studies to evaluate the cellular heterogeneity using transcriptional profiling of single cells (U01): University of Pennsylvania: Role of single cell mRNA variation in systems associated electrically excitable cells University of Southern California: Evaluation of cellular heterogeneity using patchclamp and RNA-seq of single cells University of California at San Diego: Single cell sequencing and in situ mapping of RNA transcripts in human brains The SCAP has been designed as a five-year program with several components: (1) the collection, analysis and sharing of comprehensive expression datasets to understand the role of heterogeneity in tissues and systemically and identify critical parameters and states; (2) the discovery of new, innovative tools for spatiotemporal imaging, manipulation, analysis and modeling of a biologically relevant population of cells with minimal perturbation; (3) milestone-driven validation and translation of technologies for characterizing single cells in situ meeting the needs of end-users; and (4) development and coordination of a multidisciplinary research community through workshops and other collective endeavors. Further details about the NIH SCAP-T program can be found here http://commonfund.nih.gov/singlecell/. The SCAP-T data sets include the detailed phenotype information, experimental protocols, QC information, RNA-sequencing data and NGS results from human heart and human brain cells. The first data release includes 697 single cells from human brain and heart. The second data release includes an additional 978 single cells from human brain and heart. The third data release includes an additional 1631 single cells from human brain and heart. The fourth data release is a correction to this study, but adds no new cells. The fifth data release includes an additional 2984 single cells from human brain and heart. The sixth data release includes an additional 944 single cells from human brain and heart. The seventh data release includes an additional 1977 single cells from human brain and heart. The SCAP-T data portal provides a customized interface for users to quickly identify and retrieve files by phenotypes, and data properties such as sequencing facility or coverage for all of these 9231 single cells. For more information about the SCAP-T study and the data portal, please visit http://www.scap-t.org.
Study
phs000833
A Comprehensive Catalogue of Somatic Mutations from a Human Cancer Genome
We previously reported the whole-genome sequencing of the well-studied human malignant melanoma cell line, COLO-829, and a lymphoblastoid cell line from the same individual, providing the first comprehensive catalogue of somatic mutations from an individual cancer (Pleasance et al. (2010) Nature 463, 191). This initial study was carried out using the Illumina Genome Analyzer II, largely using paired-end reads of 75 bases from standard Illumina paired-end libraries. Here we report the results of sequencing the genomes of COLO-829 and its matched lymphoblastoid sample using HiSeq2000 with v3 cluster-generation and sequencing chemistries. This has significantly improved the evenness and completeness of genomic coverage, particularly in areas of extreme sequence composition that were underrepresented in the initial study.
Study
EGAS00001000245
Kids First: Genomics of Orofacial Cleft Birth Defects in Latin American Families
The Gabriella Miller Kids First Pediatric Research Program) (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). Both childhood cancers and structural birth defects are critical and costly conditions associated with substantial morbidity and mortality. Elucidating the underlying genetic etiology of these diseases has the potential to profoundly improve preventative measures, diagnostics, and therapeutic interventions. All of the WGS and phenotypic data from this study are accessible through dbGaP and https://kidsfirstdrc.org, where other Kids First datasets can also be accessed. The Kids First study of nonsyndromic orofacial cleft (OFC) birth defects in Latin American families is a whole genome sequencing study of 283 Latin-American parent-case trios drawn from ongoing collaborations led by Dr. Mary L. Marazita of the University of Pittsburgh Center for Craniofacial and Dental Genetics, and including a collaboration with Dr. Lina Moreno Uribe and Dr. Andrew Lidral of the University of Iowa. All families were ascertained through the Clinica Noel where patients with OFCs receive care from the Antioquia University School of Dentistry in Medellin, Colombia (key on-site colleagues included Dr. Luz Consuelo Valencia-Ramirez and Dr. Mauricio Arcos-Burgos). Genetic studies have shown that this population is comprised of an admixture of immigrant male Caucasians (mainly Spaniards and Basques) and native Amerindian females. Every subject has had a genetic evaluation, including a pedigree analysis for a family history of clefting and other birth defects, a pregnancy history for environmental exposures, and a complete physical exam to rule out suspected or known syndromes or environmental phenocopies. Sequencing was done by the Broad Institute sequencing center funded by the Kids First program (grant number U24-HD090743). The case in each of the Kids First OFC trios has cleft lip (CL, Figure A below), cleft palate (CP, Figure B), or both (CL+CP, Figure C): OFCs are genetically complex structural birth defects caused by genetic factors, environmental exposures, and their interactions. OFCs are the most common craniofacial anomalies in humans, affecting approximately 1 in 700 newborns, and are one of the most common structural birth defects worldwide. On average a child with an OFC initially faces feeding difficulties, undergoes 6 surgeries, spends 30 days in hospital, receives 5 years of orthodontic treatment, and participates in ongoing speech therapy, leading to an estimated total lifetime treatment cost of about $200,000. Further, individuals born with an OFC have higher infant mortality, higher mortality rates at all other stages of life, increased incidence of mental health problems, and higher risk for other disorders (notably including breast, brain, and colon cancers). Prior genome-wide linkage and association studies have now identified at least 18 genomic regions likely to contribute to the risk for nonsyndromic OFCs. Despite this substantial progress, the functional/pathogenic variants at OFC-associated regions are mostly still unknown. Because previous OFC genomic studies (genome-wide linkage, genome-wide association studies (GWAS), and targeted sequencing) are based on relatively sparse genotyping data, they cannot distinguish between causal variants and variants in linkage disequilibrium with unobserved causal variants. Moreover, it is unknown whether the association or linkage signals are due to single common variants, haplotypes of multiple common variants, clusters of multiple rare variants, or some combination. Finally, we cannot yet attribute specific genetic risk to individual cases and case families. Therefore, the goal of the current study is identify specific OFC risk variants in Latin American families by performing whole genome sequencing of parent-case trios.
Study
phs001420
Kids First Pediatric Research Study in Familial Predisposition to Hematopoietic Malignancies (SJFAMILY-HM)
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). All of the WGS and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed. Acute lymphoblastic leukemia (ALL) is the commonest childhood tumor and a leading cause of cancer death in children, adolescents and young adults. Hodgkin and non-Hodgkin lymphoma are also important hematologic malignancies (HM) that occur in children. Each are genetic diseases with growing evidence for a germline predisposition of both familial and sporadic cases, however the inherited genetic basis of ALL/lymphoma are poorly understood. Such knowledge is essential to gain mechanistic insight into the basis of tumor formation, and to guide genetic counseling and genetic management. Here we have assembled a large collection of familial HM kindreds, and extended recurrence cohorts of ALL and HL which will be used to identify the genetic basis of familial HM, examine the frequency of germline variants in sporadic ALL and HL, and to integrate inherited and somatic genomic data. These studies have high potential to provide fundamental new insights into the inherited genetic basis of HM, to provide important information to guide clinical management, and to provide an invaluable public resource of genomic data.
Study
phs001738
National Institutes of Health H3Africa African Collaborative Center for Microbiome and Genomics Research (ACCME)
The African Collaborative Center for Microbiome and Genomics Research is a multi-institutional collaborative research project. The objectives of the project are to collaborate and implement high impact integrative epidemiology and genomics research into discovery of biomarkers associated with cervical carcinogenesis. Specifically ACCME links and leverages existing funded research and program activities at the collaborating institutions to study the interaction between vaginal microbiome, host genetic factors and molecular variants of Human Papilloma Virus (HPV) to determine correlates of viral persistence in the causal pathway of cervical cancer, a major cause of preventable mortality in Nigeria and other parts of sub-Saharan Africa. This research offers several opportunities to advance understanding of cervical carcinogenesis, viral oncogenesis, new biomarker discovery and risk stratification by genotype in a cohort of African women. Sponsorship: This project is sponsored by the National Institutes of Health (NIH)
Study
phs001945
CIDR: The Role of Rare Coding Variation in Prostate Cancer in Men of African Ancestry - RESPOND Project 2
In RESPOND Project 2, we seek to identify rare genetic factors that are associated with prostate cancer (PCa) risk and aggressiveness in men of African ancestry (AA). We will conduct exome sequencing of 15,000 prostate cancer cases and 5,000 controls from the RESPOND cohort and the African Ancestry Prostate Cancer Consortium (AAPC) with cases selected based on risk categories: high-risk (stage T3/T4 or Gleason 8+ or PSA>20 ng/ml), intermediate-risk (stage T2b/T2c or Gleason 7 or PSA 10-20 ng/ml) and low-risk disease (stage T1/T2a and Gleason ≤ 6 and PSA<10 ng/ml). We expect the findings from this Project to significantly advance knowledge of susceptibility to aggressive PCa and racial/ethnic disparities in PCa risk, and to guide the development of future preventive, early detection and prognostic measures for AA men. The first phase of the study will include exome sequence data for approximately 7,500 cases and 5,000 controls from AAPC. The second phase of the study will include exome data for 5,000 cases from RESPOND and AAPC.ACKNOWLEDGMENTS and CONTRIBUTING SITES Multiethnic Cohort (MEC): The MEC and the genotyping in this study were supported by National Institutes of Health (NIH) grants CA63464, CA54281, CA1326792, CA148085, and HG004726. Cancer incidence data for the MEC and Los Angeles Study of Aggressive Prostate Cancer (LAAPC) studies have been collected by the Los Angeles Cancer Surveillance Program of the University of Southern California with federal funds from the National Cancer Institute (NCI)/NIH/Department of Health and Human Services (DHHS) under Contract No. N01-PC-35139, and the California Department of Health Services as part of the state-wide cancer reporting program mandated by California Health and Safety Code Section 103885, and grant 1U58DP000807-3 from the Centers for Disease Control and Prevention (CDC). Ghana Prostate Study (GPS): The Ghana Prostate Study was funded by the Intramural Program of the NCI/NIH/DHHS under Contract No. HHSN261200800001E. Men of African Descent and Carcinoma of the Prostate (MADCaP): We thank all MADCaP study participants. This work is a product of the MADCaP network. This work was supported by NCI/NIH grant U01CA184374 to Timothy Rebbeck and National Institute of General Medical Sciences (NIGMS) MIRA grant R35GM133727 to Joseph Lachance. Additional funding includes a seed grant from the Integrated Cancer Research Center at Georgia Institute of Technology. UGANDA: The UGANDA study was supported by NIH grant R01CA165862. Southern Community Cohort (SCCS) is funded by NIH grant CA092447. SCCS sample preparation was conducted at the Epidemiology Biospecimen Core Lab that is supported in part by the Vanderbilt Ingram Cancer Center (CA68485). Data on SCCS cancer cases were provided by Alabama Statewide Cancer Registry, Kentucky Cancer Registry, Office of Cancer Surveillance at Tennessee Department of Health, Florida Cancer Data System, Central Cancer Registry at North Carolina Division of Public Health, Georgia Comprehensive Cancer Registry, Louisiana Tumor Registry, Mississippi Cancer Registry, South Carolina Central Cancer Registry, Virginia Cancer Registry at Virginia Department of Health, and Cancer Registry at Arkansas Department of Health. The Arkansas Central Cancer Registry is fully funded by a grant from National Program of Cancer Registries (NPCR)/Centers for Disease Control and Prevention (CDC). Data on SCCS cancer cases from Mississippi were collected by the Mississippi Cancer Registry which participates in the NPCR/CDC. The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the CDC or the Mississippi Cancer Registry. Karuprostate: The Karuprostate study was supported by the French National Health Directorate and by the Association pour la Recherche sur les Tumeurs de la Prostate. Séverine Ferdinand, Marc Romana. The North Carolina - Louisiana Prostate Cancer Project (PCaP) is carried out as a collaborative study supported by the Department of Defense contract DAMD 17-03-2-0052. The authors thank the staff, advisory committees and research subjects participating in the PCaP study for their important contributions. We would also like to acknowledge the UNC BioSpecimen Facility and the LSUHSC Pathology Lab for our DNA extractions, blood processing, storage and sample disbursement (https://genome.unc.edu/bsp). Gene-Environment Interaction in Prostate Study (GECAP) was supported by NIH grant ES011126. King County Prostate Cancer Study (KCPCS) was supported by NIH grants CA056678, CA082664, and CA092579, with additional support from the Fred Hutchinson Cancer Research Center. We thank the participants in these studies, and Ms. Suzanne Kolb for help with study management. The Los Angeles Study of Aggressive Prostate Cancer (LAAPC) was funded by grant 99-00524V-10258 from the Cancer Research Fund, under Interagency Agreement #97-12013 (University of California contract #98-00924V) with the Department of Health Services Cancer Research Program. Cancer incidence data for the MEC and LAAPC studies have been collected by the Los Angeles Cancer Surveillance Program of the University of Southern California with Federal funds from the NCI/NIH/DHHS under Contract No. N01-PC-35139, and the California Department of Health Services as part of the state-wide cancer reporting program mandated by California Health and Safety Code Section 103885, and grant 1U58DP000807-3 from the CDC. Prostate Cancer Studies at MD Anderson (MDA) was supported by grants CA68578, ES007784, DAMD W81XWH-07-1-0645, and CA140388.
Study
phs002637
The Genetic Architecture of Smoking and Smoking Cessation
This study includes samples from two projects: Collaborative Genetic Study of Nicotine Dependence (COGEND; PI: Laura Bierut) and University of Wisconsin Transdisciplinary Tobacco Use Research Center (UW-TTURC; PI: Timothy Baker). Data are available for an additional 1420 COGEND subjects through the Study of Addiction: Genetics and Environment (SAGE), dbGaP study accession phs000092. The majority of these subjects are independent from the current study, but there is a small amount of overlap between the two samples (n=29 subjects) for quality control purposes. It should be noted that the case definition in the SAGE study is DSM-IV alcohol dependence. The case definition in the current study is nicotine dependence by a current score of 4 or greater on the Fagerström Test for Nicotine Dependence (FTND). The overall goal of this project is to identify and characterize genetic variants that contribute to the development of nicotine dependence, related smoking behaviors, and smoking cessation. The COGEND sample includes unrelated cases and controls for a genetic association study of nicotine dependence. Cases are defined by a commonly used definition of nicotine dependence, a current score of 4 or more (maximum score of 10) on the Fagerström Test for Nicotine Dependence (FTND). Control status is defined as an individual who smoked at least 100 cigarettes during their lifetime, yet never became dependent (lifetime FTND=0). By selecting controls who smoked, those genetic effects that are specific to nicotine dependence can be examined. The UW-TTURC sample includes nicotine dependent smokers from three smoking cessation studies. Subjects had to smoke at least 10 cigarettes per day (confirmed smoking by an alveolar carbon monoxide (CO) level greater than 9) and report being motivated to quit smoking. Participants were excluded based on evidence of psychosis history, clinically significant depression symptoms, other severe mental illness, or contraindications to smoking cessation medications. COGEND: COGEND was initiated in 2001 as a three-part program project grant funded through the National Cancer Institute (NCI; PI: Laura Bierut). The three projects included a study of the familial transmission of nicotine dependence, a genetic study of nicotine dependence, and a study of the relationship of nicotine dependence with nicotine metabolism. The primary goal is to detect, localize, and characterize genes that predispose or protect an individual with respect to heavy tobacco consumption, nicotine dependence, and related phenotypes and to integrate these findings with the family transmission and nicotine metabolism findings. The primary design is a community based case-control family study. Nicotine dependent cases and non-dependent, smoking controls were identified and recruited from Detroit and St. Louis. In addition, one sibling for each case and control subject was recruited in a subset of the sample. More than 54,000 subjects aged 25-44 years were screened by telephone; more than 3,100 subjects were personally interviewed; and more than 2,900 subjects donated blood samples for genetic studies. UW-TTURC: The UW-TTURC was initiated in 2001 as a study of nicotine dependence and smoking cessation treatment. The second round of UW-TTURC was initiated in 2005 as a study of efficacy of smoking cessation and long term outcomes. Nicotine dependent smokers seeking cessation treatment were identified and recruited from Madison and Milwaukee, WI. Over 9,000 adult smokers were screened by telephone; 2,575 individuals were enrolled and randomized to treatment conditions that involved use of different smoking cessation medications. Participants from the UW-TTURC smoking cessation clinical trials had the option of participating in a genetic substudy, and approximately 2,000 donated blood samples for genetic studies. The goal of the genetic studies of smokers seeking cessation treatment is to detect, localize, and characterize genes that predispose or protect an individual with respect to heavy tobacco consumption, nicotine dependence, and related phenotypes including cessation, withdrawal, and relapse. Both studies (COGEND and UW-TTURC) include measures of basic socio-demographic variables, including age, sex, race/ethnicity, family income, and educational attainment. Information on nicotine dependence, as assessed by the Fagerström Test for Nicotine Dependence (FTND) is available for all subjects. In addition, participants also completed the Nicotine Dependence Syndrome Scale (NDSS; Shiffman et al., 2004) and the Wisconsin Inventory of Smoking Dependence Motives (WISDM-68; Piper et al, 2004). Coding for both individual variables and indices has been standardized across studies. All subjects were assessed in person by trained research assistants. This study is part of the Gene Environment Association Studies initiative (GENEVA, http://www.genevastudy.org) funded by the trans-NIH Genes, Environment, and Health Initiative (GEI). The overarching goal is to identify novel genetic factors that contribute to the genetic architecture of smoking through large-scale genome-wide association studies of two well-characterized cohorts. Genotyping was performed at the Johns Hopkins University Center for Inherited Disease Research (CIDR). Data cleaning and harmonization were done at the GEI-funded GENEVA Coordinating Center at the University of Washington.
Study
phs000404
DAC for YCC Sarcoma
Access can be granted by contacting Hyo Song Kim (hyosong77@yuhs.ac).
Dac
EGAC50000000046
Mapping_regulatory_variation_in_sensory_neurons_using_IPS_lines_from_the_HIPSCI_project
Sensory neurons are nerve cells that are activated by sensory input such as heat, light and convey information to the brain. Although a key cell type in complex organisms, human sensory neurons are challenging to study because they are impossible to obtain from living donors. We have collaborated with the Neucentis Pharmaceutical Research Unit to differentiate sensory neuron like cells from human induced pluripotent stem cells derived as part of the Human Induced Pluripotent Stem Cells Initiative. We will sequence RNA from 100 IPS lines derived from healthy individuals and perform RNA-seq on the differentiated cells to identify noncoding variants that alter gene expression in human sensory neurons.
Study
EGAS00001001149
Coronary Artery Risk Development in Young Adults (CARDIA) Study - Cohort
CARDIA is a study examining the etiology and natural history of cardiovascular disease beginning in young adulthood. In 1985-1986, a cohort of 5115 healthy black and white men and women aged 18-30 years were selected to have approximately the same number of people in subgroups of age (18-24 and 25-30), sex, race, and education (high school or less and more than high school) within each of four US Field Centers. These same participants were asked to participate in follow-up examinations during 1987-1988 (Year 2), 1990-1991 (Year 5), 1992-1993 (Year 7), 1995-1996 (Year 10), 2000-2001 (Year 15), 2005-2006 (Year 20), and 2010-2011 (Year 25); the proportions of the surviving cohort that have returned for the seven follow-up examinations were 90%, 86%, 81%, 79%, 74%, 72%, and 72%, respectively. In addition to the follow-up examinations, participants are contacted regularly for the ascertainment of information on out-patient procedures and hospitalizations experienced between contacts. Within the past five years, 95% of the original surviving cohort has been contacted. While the specifics of each examination has differed somewhat, data have been collected on a variety of factors believed to be related to heart disease. These include conditions with clear links to heart disease such as blood pressure, cholesterol and other lipids. Data have also been collected on physical measurements such as weight and skinfold fat as well as lifestyle factors such as substance use (tobacco and alcohol), dietary and exercise patterns, behavioral and psychological variables, medical and family history, and other chemistries (e.g., insulin and glucose). In addition, subclinical atherosclerosis was measured via echocardiography during Years 5, 10, and 25, computed tomography during Years 15 and 20, and carotid ultrasound during Year 20. The CARDIA Cohort is utilized in the following dbGaP sub-studies. To view genotypes, other molecular data, and derived variables collected in these sub-studies, please click on the following sub-studies below or in the "Sub-studies" box located on the right hand side of this top-level study page phs000285 CARDIA Cohort. phs000236 PAGE_CALiCo_CARDIA phs000309 GENEVA_CARDIA phs000399 GO-ESP HeartGO_CARDIA phs000613 CARDIA_CARe
Study
phs000285
Gene Environment Association Studies (GENEVA): Genetics of Early Onset Stroke (GEOS) Study
The Genetics of Early Onset Stroke (GEOS) Study is a population-based case-control study designed to identify genes associated with early-onset ischemic stroke and to characterize interactions of identified stroke genes and/or SNPs with environmental risk factors such as smoking and oral contraceptive use. The GEOS study consists of 921 ischemic stroke cases with age of first stroke 16-50 years and a similar number of controls, identified from the Baltimore-Washington area. Cases and controls were recruited in 3 different time periods: Stroke Prevention in Young Women-1 (SPYW-1) conducted from 1992-1996, Stroke Prevention in Young Women-2 (SPYW-2) conducted from 2001-2003, and Stroke Prevention in Young Men (SPYM) conducted from 2003-2007. The overall GEOS sample includes 477 cases who self-reported their race as "white" and 396 cases who self-reported their race as "African American." Traditional stroke risk factors and other study variables, including age, ethnicity, and history of hypertension, diabetes, myocardial infarction (MI), current smoking status, and current oral contraceptive use (both defined as use within one month prior to event for cases and at a comparable reference time for controls), were also collected during standardized interview and were included as covariates in our analyses. This study is part of the Gene Environment Association Studies initiative (GENEVA, http://www.genevastudy.org) funded by the trans-NIH Genes, Environment, and Health Initiative (GEI). The overarching goal is to identify novel genetic factors that contribute to early-onset ischemic stroke through large-scale genome-wide association studies of cases and controls of European and African descent from the Baltimore-Washington area. Genotyping was performed at the Johns Hopkins University Center for Inherited Disease Research (CIDR). Data cleaning and harmonization were done at the GEI-funded GENEVA Coordinating Center at the University of Washington.
Study
phs000292
Genome of the Netherlands
The Genome of the Netherlands (GoNL) Project characterizes DNA sequence variation, common and rare, for SNVs, short insertions and deletions (indels) and larger deletions in 769 individuals of Dutch ancestry selected from five biobanks under the auspices of the Dutch hub of the Biobanking and Biomolecular Research Infrastructure (BBMRI-NL). The samples come from a representative sample of 250 trio-families from all provinces in the Netherlands. The parent-offspring trios include adult individuals ranging in age from 19 to 87 years (mean=53 years; SD=16 years) from birth cohorts 1910-1994. Sequencing was done on blood-derived DNA from uncultured cells and accomplished coverage was 14-15x. Samples where contributed by LifeLines (http://lifelines.nl/lifelines-research/general), The Leiden Longevity Study (http://www.healthy-ageing.nl; http://www.langleven.net), The Netherlands Twin Registry (NTR: http://www.tweelingenregister. org), The Rotterdam studies, (http://www.erasmus-epidemiology.nl/rotterdamstudy) and the Genetic Research in Isolated Populations program (http://www.epib.nl/research/geneticepi/research.html#gip). The sequencing was carried out in collaboration with the Beijing Institute for Genomics (BGI). The analysis was done by a consortium lead by UMCG, LUMC, Erasmus MC, VU university and UMCU, see http://www.nlgenome.nl. Funding for the project was provided by the Netherlands Organization for Scientific Research under award number 184021007, dated July 9, 2009 and made available as a Rainbow Project of the Biobanking and Biomolecular Research Infrastructure Netherlands (BBMRI-NL).
Study
EGAS00001000644
PCPT and SELECT Cohorts: Core Infrastructure Support for Cancer Research
SELECT study: SELECT was a phase III, double-blind, placebo-controlled 4-arm study of selenium, vitamin E, selenium and vitamin E together, and placebo designed to assess the effect of these supplements on the incidence of prostate cancer. Funded by the National Cancer Institute and conducted by SWOG, the study opened in August 2001 and quickly exceeded its accrual goal of 35,533 men.In the fall of 2008, the trial's Data Safety and Monitoring Committee recommended that participants discontinue taking study supplements due to convincing evidence (planned interim futility analysis) that neither vitamin E nor selenium supplements were associated with the prevention of prostate cancer. A subsequent analysis in 2011 showed a significant 17% increase in prostate cancer incidence in participants who were on the vitamin E only arm. Approximately 17,000 participants from the original trial recently completed an additional four years of centralized follow-up. This follow-up ended on May 31, 2014 and SELECT is now closed. PCPT study: The Prostate Cancer Prevention Trial (PCPT) was a SWOG Cancer Research Network-coordinated study designed to test whether the drug finasteride (Proscar®) would prevent prostate cancer in men ages 55 and older. This study was closed on June 24, 2003 because the study objective had been reached. The results of the study, which did find a preventive benefit for the drug, were published as "The Influence of Finasteride on the Development of Prostate Cancer," in the New England Journal of Medicine on July 17, 2003.
Study
phs003382
Clinical Sequencing Exploratory Research Consortium: Incorporation of Genomic Sequencing into Pediatric Cancer Care
This project includes data derived from subjects enrolled in the BASIC3 (Baylor College of Medicine Advancing Sequencing in Childhood Cancer Care) study. BASIC3 is a National Genome Human Research Institute (NHGRI) and National Cancer Institute-funded Clinical Sequencing Exploratory Research (CSER) consortium project that focused on prospective implementation of clinical whole exome sequencing in the pediatric oncology clinic. The primary study objective were to integrate information from CLIA-certified germline and tumor exome sequencing into the care of newly diagnosed solid tumor patients at Texas Children's Cancer Center, and to perform parallel evaluation of the impact of tumor and germline exomes on families and physicians. Blood and frozen tumor (if available) samples were collected from children undergoing surgery or biopsy of newly diagnosed solid tumors and subjected to exome sequencing in a CLIA-certified laboratory. Germline and tumor (if applicable) exome sequencing reports were generated and submitted into the electronic health record and returned to each patient/family by their primary oncologist. In addition to the clinical exome sequencing, specific (optional) consent was requested for research sequencing studies. If this consent was obtained then research studies of some of the children and parents participating in the BASIC3 study (including tumor transcriptome and whole genome sequencing of blood and/or tumor) are performed. The Clinical Sequencing Exploratory Research Consortium Cohort is utilized in the following dbGaP sub-study. To view molecular data, and derived variables collected in this sub-study, please click on the following sub-studies below or in the "Sub-study" section of this top-level study page phs001383 Clinical Sequencing Exploratory Research Consortium Cohort. phs001026 BASIC3 phs001878 GMKF BASIC3
Study
phs001683
Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) - Genome-Wide Association Study Meta-Analysis
Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) was one of five projects funded in 2010 as part of the NCI's Genetic Associations and Mechanisms in Oncology (GAME-ON) initiative (http://epi.grants.cancer.gov/gameon/). GAME-ON's overall goal was to foster an intra-disciplinary and collaborative approach to the translation of promising research leads deriving from the initial wave of cancer GWAS. Specific goals included replication of previous GWAS findings and identification of new susceptibility loci through meta analyses of existing GWAS data and fine mapping of identified loci to better pinpoint causal variants; and identify germline variants that are associated with risk of multiple cancers. The other four funded GAME-ON projects were: the ColoRectal TransdisciplinaryStudy (CORECT), Elucidating Loci Involved in Prostate Cancer Susceptibility (ELLIPSE), Follow-up of Ovarian Cancer Genetic Association and Interaction Study (FOCI), and Transdisciplinary Research in Cancer of the Lung (TRICL). To identify additional cancer risk loci, improve the precision of fine-mapping, and facilitate cross-cancer analyses, DRIVE investigators performed a meta-analysis of eleven genome-wide association studies of breast cancer: The Australian Breast Cancer Family Study (ABCFS), the British Breast Cancer Study (BBCS), the Breast and Prostate Cancer Cohort Consortium (BPC3), the Breast Cancer Family Registries (BCFR), the Dutch Familial Bilateral Breast Cancer Study (DFBBCS), German Consortium for Hereditary Breast and Ovarian Cancer (GC-HBOC), the Helsinki breast cancer family Study (HEBCS), the Mammary Carcinoma Risk factor Investigation (MARIE), the Singapore and Sweden Breast Cancer Study (SASBAC), the Triple Negative Breast Cancer Study (TNBC), and the UK2 GWAS. These studies comprised a total of 16,062 cases and 46,157 controls. Imputation to the 1,000 Genomes Project Phase 1 v3 ALL reference panel was performed by study, and summary statistics from each study were combined using fixed-effect meta analysis.
Study
phs001263
Viral Respiratory Pathogens Genetics
The overall purpose of this study is to investigate the host genetic factors in response to influenza virus infection, with the focus on influenza vaccination in the first substudy "Adult Influenza Vaccine Genetics" and with the focus on influenza natural infection and other acute respiratory infections (ARIs) in the second substudy "Acute Viral Respiratory Infection Genetics". In the first substudy, healthy adults were enrolled in 2008 (male cohort) and 2010 (female cohort) and immunized with seasonal influenza vaccine. In the second substudy, healthy adults were invited to enroll to be followed for acute respiratory illness through two consecutive influenza seasons 2009-2010 and 2010-2011. Peripheral blood genomic DNA samples were collected from all the subjects, and time-series RNA and serum samples were obtained pre- and post- immunization/infection. Genotyping was carried out on peripheral blood genomic DNA samples using Illumina HumanOmniExpress-12 v1 arrays. Peripheral blood RNA samples obtained at each visit were analyzed using Illumina Human HT-12 (for all the samples) and HiSeq 2000 (for 130 samples in the "Acute Viral Respiratory Infection Genetics" study). Serum specimens were tested using hemagglutination-inhibition (HAI) antibody assay for Influenza H1N1, H3N2, and Influenza B strains. A detailed description of each substudy is provided under their own pages below and via the grouping tool in the right-hand box: phs000635 Adult Influenza Vaccine Genetics phs001031 Acute Viral Respiratory Infection Genetics
Study
phs001030
Autism Sequencing Consortium (ASC)
The ARRA Autism Sequencing Collaboration was created in 2010 bringing together expert large-scale sequencing center (at the Baylor College of Medicine, PI Richard Gibbs and the Board Institute of MIT and Harvard, PI Mark J. Daly) and a collaborative network of research labs focused on the genetics of autism (brought together by the Autism Genome Project and the Autism Consortium). These groups worked together to utilize dramatic new advances in DNA sequencing technology to reveal the genetic architecture of autism through comprehensive examination of the exotic sequence of all genes. The Autism Sequencing Consortium (ASC) was founded by Joseph D. Buxbaum and colleagues as an international group of scientists who share autism spectrum disorder (ASD) samples and genetic data. The PIs are Drs. Joseph D. Buxbaum (Icahn School of Medicine at Mount Sinai), Mark J. Daly (Broad Institute of MIT and Harvard), Bernie Devlin (University of Pittsburgh School of Medicine), Kathryn Roeder (Carnegie Mellon University, Matthew State and Stephan Sanders (University of California, San Francisco). The rationale for the ASC is described in Buxbaum et al. 2012, and this paper should be cited when referencing the data set. All shared data and analysis is hosted at a single site, which enables joint analysis of large-scale data from many groups. The ASC was first supported by a cooperative agreement grant to four lead sites funded by the National Institute of Mental Health (U01MH100233, U01MH100209, U01MH100229, U01MH100239), with additional support from the National Human Genome Research Institute. The NIMH recently renewed their support with a second grant (U01MH111661, U01MH111660, U01MH111658 and U01MH111662) to expand the project from 29,000 genomes to more than 50,000 exomes over the next 5 years. NHGRI provides ongoing sequencing support for the ASD through the Broad Center for Common Disease Genomics (UM1HG008895, Mark Daly, PI).
Study
phs000298
SEER Remote Access Pilot Test Data (2018)
This is a limited use Surveillance, Epidemiology, and End Results (SEER) file for use in a pilot test of remote data access. The SEER Program of the National Cancer Institute (NCI) is an authoritative source of information on cancer incidence and survival in the United States. SEER currently collects and publishes cancer incidence and survival data from population-based cancer registries covering approximately 34.6 percent of the U.S. population. The SEER Program registries routinely collect data on patient demographics, primary tumor site, tumor morphology and stage at diagnosis, first course of treatment, and follow-up for vital status. The mortality data reported by SEER are provided by the National Center for Health Statistics. The population data used in calculating cancer rates is obtained periodically from the Census Bureau. The SEER research data files include SEER incidence and population data associated by age, sex, race, year of diagnosis, and geographic areas.
Study
phs002012
National Heart, Lung, and Blood Institute (NHLBI) Bench to Bassinet Program: The Pediatric Cardiac Genetics Consortium (PCGC) Study
Multi-center, prospective observational cohort study of individuals with congenital heart defects (CHD). Phenotypic data and source DNA derived from 10,000 probands, parents, and families of interest are being collected to investigate relationships between genetic factors and phenotypic and clinical outcomes in patients with CHD. Phenotype data will be stored at dbGaP, while molecular and sequence data will be stored at BioData Catalyst. The PCGC Cohort is utilized in the following dbGaP substudies. Please click on the following substudies below or in the "Substudies" section of this top-level study page phs001194 PCGC Cohort. phs000571 PCGC: whole exome sequences, whole genome sequences, targeted sequences, MIP sequences, and SNP array data phs001843 PCGC-CMG Collaboration: whole genome sequences The Gabriella Miller Kids First Pediatric Research Program (Kids First) subset of the PCGC project (phs001194) is now accessible through a separate dbGaP study accession: phs001138. To access this dataset, please submit a Data Access Request (DAR) for phs001138. Approval of this DAR will be expedited for approved users of phs001194. To learn about other Kids First datasets visit https://kidsfirstdrc.org/.NHLBI's TOPMed program has provided additional Whole Genome Sequencing for PCGC participants - that data is accessible through a separate dbGaP sudy accession: phs001735. Access to this data set should be requested through a Data Access Request (DAR) for phs001735.
Study
phs001194
Privacy Notice for the Account User
Privacy Notice for EGA User Account
This Privacy Notice explains what personal data is collected by the specific service you are requesting, for what purposes, how it is processed, and how we keep it secure.
Note that this service collects personal data directly provided by the user, and also collects personal data from users that is provided by other organisations.
1. Who controls your personal data and how to contact us?
European Genome- Phenome Archive - EGA offers a service for permanent archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical research projects, jointly managed by European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI) and Fundació Centre de Regulació Genòmica - Centre for Genomic Regulation (CRG).
EMBL-EBI and CRG represent joint Data Controllers’ of processing of your personal data. They and their Data protection officers may be contacted for data protection queries and for exercising your rights under Section 8.
You may contact EMBL-EBI, represented by dr. Thomas Keane, by:
email at: tk2@ebi.ac.uk or
post at EMBL-EBI, Wellcome Genome Campus, CB10 1SD Hinxton, Cambridgeshire, UK.
EMBL’s Data Protection Officer may be contacted by:
telephone at +49 6221 387-8590,
email at dpo@embl.org, or
post at EMBL Heidelberg, Data protection officer, Meyerhofstraße 1, 69117 Heidelberg, Germany.
You may contact CRG, whose EGA team is represented by dr. Jordi Rambla de Argila, by:
email at jordi.rambla@crg.eu, or
post at Fundació Centre de Regulació Genòmica - Centre for Genomic Regulation (CRG), Dr.Aiguader 88, PRBB Building, 08003 Barcelona, Spain.
CRG Data protection officer may be contacted by:
email at dpo@crg.eu
post at Fundació Centre de Regulació Genòmica - Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, PRBB Building, 08003 Barcelona, Spain.
2. Which is the lawful basis for processing personal data?
We process your personal data on the grounds of important public interest.
For monitoring your activities on the website, we process your personal data on the grounds of important public interest. Such legal basis is found in Article 5(1)(a) of EMBL Internal Policy No 68 on General Data Protection (hereinafter IP 68), which is equivalent to Article 6 (1)(e) of the EU General Data Protection Regulation (hereinafter GDPR) and upon which personal data are processed for the achievement of the aims laid down in 1973 agreement establishing EMBL, such as the promotion of the cooperation in the fundamental research, in the development of advanced instrumentation and in advanced teaching in molecular biology and dissemination of information.
3. What personal data is collected from users of the service? How do we use this personal data?
We collect the following personal data from you:
Name
Email address
Title/Position
Organisation
Organisational affiliation
Username and password (to authenticate access to the system)
IP Address (for anonymous usage statistics)
We process your personal data:
to provide you with the authenticated access to the EGA service (opening and managing submission and distribution account),
to publicly publish some aggregate data to facilitate scientific research (e.g. number of accounts, geographic distribution),
to better understand the needs of the users and to guide future improvements of the service,
to create anonymous usage statistics.
If you do not provide us with your personal data we will not be able to open the user account and offer you our services or we will only provide you a subset of functionalities available within the service.
4. Who will have access to your personal data?
The personal data will be disclosed to:
Authorised staff in the data controller’s institutions acting on data controller`s behalf and instructions (for all user account data),
Requested dataset Data access committee – DAC will have access to the name, email, organization, affiliation, title/position of the distribution user account, using it for their own purpose of granting access to their datasets.
5. Will your personal data be transferred to third countries (i.e. countries not part of EU/EEA) and/or international organisations?
Distribution user account data are in the process of granting access disclosed to dataset(s) DAC, which might be a recipient in countries outside of the European Economic Area. Insofar as the second joint controller may be subject to GDPR, data transfer to and from the first joint controller (EMBL-EBI) is necessary for important reasons of public interest embedded in the aims of the EMBL and justified in the Article 9(4) of IP 68 (equivalent to Article 49(1)(d) of GDPR) read in conjunction with EMBL`s 1973 establishing agreement and Article 179(2) of the Treaty on the Functioning of the European Union.
6. How long do we keep your personal data?
All data are stored for as long as you have the account open. Thereafter the storage is prolonged for as long as our service is live, even if you stop using our services. This prolongation is necessary for further scientific research, to ensure legal compliance and security and to facilitate internal and external audits it they arise.
By contrast, the log files for the data categories related to anonymous usage statistics (raw web service logs) are processed only for 30 days and thereafter erased.
7. The joint Data Controllers provide these rights regarding your personal data
You have the right to:
Not be subject to decisions based solely on an automated processing of data (i.e. without human intervention) without you having your views taken into consideration.
Request at reasonable intervals and without excessive delay or expense, information about the personal data processed about you. Under your request we will inform you in writing about, for example, the origin of the personal data or the preservation period.
Request information to understand data processing activities when the results of these activities are applied to you.
It must be clarified that rights under points 4 and 5 are only available whenever you need support whilst using our website. For other processing based on the grounds of important public interest you cannot exercise your rights to object, rectify or erase your personal data according to the Article 13(2)(a)(b) of IP 68 (equivalent to Article 17(3)(b)(d) and Article 21(6) of the GDPR).
8. Supervisory authority
If you wish to complain against the processing of your personal data, you may do so by post at:
EMBL Heidelberg, Data Protection Committee, Meyerhofstraße 1, 69117 Heidelberg, Germany, or
Autoritat Catalana de Protecció de Dades (Catalan Data Protection Authority), C/Rosselló 214, Esc A, 1r 1a, Barcelona 08008, Spain.
Published at: February 6, 2019
Documentation
data-protection/privacy-notice/ega-user-account
Reference epigenomes generated as part of the International Human Epigenomics Consortium (IHEC)
The Centre for Epigenome Mapping Technologies (CEMT) is an epigenome sequencing platform funded by the Canadian Institutes of Health Research (CIHR) and Genome BC as part of the Canadian Epigenetics, Environment and Health Research Consortium (CEEHRC). CEMT aims to produce 100 human reference epigenomes, each comprising the following datasets: whole genome; whole genome bisulfite and oxidative bisulfite; transcriptome; microRNA; and chromatin immunoprecipitation (ChIP)-seq for the core chromatin marks (H3K4me1, H3K4me3, H3K9me3, H3K27me3, H3K36me3, and H3K27ac).CEEHRC is a full member of the International Human Epigenomics Consortium (IHEC), which aims to coordinate the production of 1,000 reference maps of human epigenomes for key cellular states relevant to health and diseases. As such, all datasets submitted by the CEMT platform conform to IHEC assay standards and metadata guidelines.
Study
EGAS00001000552
PD-L1 blockade in combination with carboplatin as immune induction in metastatic lobular breast cancer: the GELATO-trial
Invasive lobular breast cancer (ILC) is the second most common histological breast cancer subtype but published data on trials specific for ILC are so far lacking. Translational research revealed that a subset of ILCs may be immune-related and more sensitive to DNA-damaging agents such as platinum. In murine ILC models, synergy between immune checkpoint blockade and platinum has been observed. Here, we tested this concept in the phase II, GELATO-trial (NCT03147040), in which patients with metastatic ILC were treated with weekly carboplatin (AUC 1.5) as immune induction treatment for 12 weeks and atezolizumab (PD-L1 blockade; every three weeks) from the third week onwards until disease progression. Four out of 23 evaluable patients had a partial response (17%, 95%CI 5-39%) and two patients had stable disease for at least 24 weeks, resulting in a clinical benefit rate of 26% (95%CI 10-48%). Out of these six patients, four patients had triple-negative ILC (TN-ILC). In serial biopsies of metastatic lesions, we observed higher CD8 T-cell infiltration, expression of immune checkpoints, and exhausted T cells upon carboplatin/PD-L1 blockade. This is the first report of a clinical trial specifically for ILC and we demonstrate promising anti-tumor activity of atezolizumab with carboplatin as immune induction, in particular for TN-ILC. While activity of carboplatin/PD-L1 blockade in classical ER+ ILC was limited, our translational data yield important insights for the design of highly needed clinical trials in ILC.
Study
EGAS00001006902
Data Quality Control
Data Quality Control
High-throughput sequencing techniques have become the leading method to study, decode and discover the genomic origins of biological phenomenons. EGA provides a secure archival of such identifiable genomics data with the purpose of data-upcycling, i.e. to re-use these data for research. High-quality data standards are essential to ensure the quality and credibility of the research. Moreover, a quality check report can assure a researcher beforehand about the data that they will request access, therefore saving time and effort.
The EGA has developed a File Quality Control Report (QC Report) to provide generic quality control reports for Fastq, SAM/BAM/CRAM, and VCF files deposited at EGA. This QC Report will allow users to get information regarding the files submitted within a specific dataset. The data requesters will obtain information such as the quality of reads, mapped reads, number of variants, and other features before starting the requesting process, which will save the efforts and time.Accessing file quality control reportsIn each dataset page, the user can explore the files that it contains by clicking the "files" tab.
The Quality Control report of a file has two sections. The first one, contains general information about the file, such as the inferred assembly, total reads, the dataset or study where it comes from, etc. The second section contains plots that summarise interesting information about the file, for example, the base coverage distribution, base quality or mapped reads.
The description of each plot is accessible by clicking the "i" button at the top-right corner of each plot box.
Technical Description
For analysing the fastq, SAM/BAM/CRAM and VCF files, the EGA applies a set of tools widely used in the bioinformatics community.
FASTQ: FastQC, recognised as the gold standard tool by the community.Per base sequence quality, per sequence quality scores, per base sequence content, per sequence GC content, sequence duplication levels, etc.SAM/BAM/CRAM: samtools, also the gold standard, generates results plots useful to get an overall idea of the quality of the file.base coverage distribution, base quality, % of mapped reads, % of both mates mapped, singletons, duplicates, etc.VCF: vcftools and bcftools, combined with a custom script to infer the genome assembly.site frequency distribution, Ts/Tv, base changes, indel distribution, etc.
Documentation
access/request-data/quality-control-reports
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Northern Manhattan Study (NOMAS)
Cohort Description The Northern Manhattan Study (NOMAS) is a study of the population of Washington Heights in Northern Manhattan. The ongoing study, which began in 1990, has enrolled over 4,400 people, some of whom have suffered a stroke or related neurological syndromes. As the cohort aged, the specific aims grew to include not only vascular determinants of stroke but also cognitive decline, mild cognitive impairment (MCI) and dementia. The overall goal of NOMAS is to investigate stroke risk factors in different race-ethnic groups. NOMAS is also committed to developing better stroke prevention programs to improve the health of the community. The Hispanic population in Northern Manhattan is largely Dominican, along with Puerto Rican, Cuban, and Central and South American components.Data Being Submitted Wave 1 questionnaire data includes 397 variables for up to 887 NOMAS participants in C4R Wave 2 questionnaire data includes 448 variables for up to 815 NOMAS participants in C4R Derived data includes 43 variables for up to 995 NOMAS participants in C4R Phenotype data includes 113 variables for up to 995 NOMAS participants in C4R
Study
phs003028
The African American Breast Cancer Epidemiology and Risk (AMBER) Consortium Study
The AMBER Consortium Study was formed to pool interview data, questionnaire data, and biological samples from epidemiological studies of breast cancer in African-American women to discover the potential causes of early-onset and aggressive breast cancer in African-American women. AMBER is funded through a Program Project grant from the National Cancer Institute. Genetic data submitted to dbGaP come from participants in the Carolina Breast Cancer Study, Women's Circle of Health Study, and Black Women's Health Study. The P01 consists of four scientific projects; the aims include follow-up on previous GWAS findings for breast cancer susceptibility in AA women as well as investigation of SNPs in candidate genes in biologically plausible pathways. These SNPs were genotyped using DNA from 3130 African-American women with breast cancer and 3700 controls. Descriptions of the original studies that provided the data and samples for this collaborative study are given below. The Carolina Breast Cancer Study (CBCS): a North Carolina population-based case-control study of breast cancer, conducted in three phases. The current study phase, phase 3 (years 2008-2014), includes women residents in 44 counties. CBCS phases 1 and 2 were conducted in 24 counties. Breast cancer cases are identified using Rapid Case Ascertainment in cooperation with the NC Central Cancer Registry. Controls were identified for phases 1 and 2 only (1993-1996 and 1996-2001), using Division of Motor Vehicles lists for women under age 65 and Health Care Financing Administration lists for women 65 and older. Randomized recruitment was used to oversample AA women and women under age 50. In-depth interviews are conducted by study nurses in participants' homes to obtain information on potential risk factors for breast cancer. DNA samples have been obtained from most participants. Overall response rates for Phases 1 and 2 were 74% for AA cases and 54% for AA controls. Phase 3, conducted in 44 counties from 2008-2014, includes cases only. The response rate for AA cases in Phase 3 was 70.5%. The Women's Circle of Health Study (WCHS): a multi-site case-control study in New York City (NYC) and New Jersey (NJ) aimed at evaluating risk factors for early and aggressive breast cancer in women of AA and EA ancestry. Recruitment in NYC took place between January 2002 and December 2008 and involved hospital-based ascertainment of cases, while controls were identified through random digit dialing (RDD). Recruitment at the NJ site started in March 2006 and is ongoing. Phase I of the study ended in April 2012 and covered seven counties in NJ. WCHS2 includes two additional counties. Cases in NJ were identified from 2006 to 2012 by the NJ State Cancer Registry using rapid case ascertainment. Controls were initially recruited though RDD (2006 to 2010) and later through community-based efforts (2009-2012). In-person interviews ascertained data on established and suspected risk factors for breast cancer. DNA samples were obtained. Among eligible AA women, 75% in NY and 54% in NJ completed an interview and provided a biologic specimen. Black Women's Health Study (BWHS): an ongoing prospective cohort study of health and illness among U.S. black women, with a focus on cancer. The study began in 1995 when 59,000 AA women 21-69 years of age from across the United States completed a 14-page postal health questionnaire. The median age at entry was 38, and participants were residents of 17 states in mainland U.S.: Northeast, 28%; South, 30%; Midwest, 23%; West, 19%. The baseline questionnaire elicited information on a wide range of variables, including demographic factors, use of medical care, family history of breast cancer, reproductive and medical history, cigarette and alcohol use, weight, height, waist and hip circumference, medication use, diet, and exercise. Biennial follow-up questionnaires ascertain new cases of breast cancer and other illnesses and update covariate information. Medical record and cancer registry data are sought for all participants who report a diagnosis of breast cancer. As of 2014, approximately 80% of the baseline cohort have completed follow-up. DNA samples were obtained from about 50% of participants. BWHS data for the AMBER consortium were prepared as a nested case-control study, with controls frequency-matched to cases on year of birth, geographic region, and most recent questionnaire completed prior to the end of the at-risk period.
Study
phs000669
Population Architecture using Genomics and Epidemiology (PAGE): Causal Variants Across the Life Course (CALiCo): Coronary Artery Risk Development in Young Adults (CARDIA)
CALiCo CARDIA The Coronary Artery Risk Development in Young Adults (CARDIA) Study is a study examining how heart disease develops in adults. It began in 1986 with a group of 5115 black and white men and women aged 18-30 years. The participants were selected so that there would be approximately the same number of people in subgroups of race, gender, education (high school or less and more than high school) and age (18-24 and 25-30) in Birmingham, AL; Chicago, IL; Minneapolis, MN; and Oakland, CA. These same participants were asked to participate in follow-up examinations during 1987-1988 (Year 2), 1990-1991 (Year 5), 1992-1993 (Year 7), 1995-1996 (Year 10), 2000-2001 (Year 15), and 2005-2006 (Year 20). A majority of the group has been examined at each of the follow-up examinations (90%, 86%, 81%, 79%, 74%, and 72%, respectively). While the specifics of each examination has differed somewhat, data have been collected on a variety of factors believed to be related to heart disease. These include conditions with clear links to heart disease such as blood pressure, cholesterol and other lipids. Data have also been collected on physical measurements such as weight and skinfold fat as well as lifestyle factors such as substance use (tobacco and alcohol), dietary and exercise patterns, behavioral and psychological variables, medical and family history, and other chemistries (e.g., insulin and glucose). In addition, subclinical atherosclerosis was measured via echocardiography during Years 5 and 10, computed tomography during Years 15 and 20, and carotid ultrasound during Year 20. A detailed description of the study and results from the first examination are summarized in Cutter et al (Controlled Clinical Trials, Volume 12, Number 1 [supplement], pages 1S-77S, 1991).
Study
phs000236
National Sleep Research Resource (NSRR): Cleveland Family Study (CFS)
The Cleveland Family Study (CFS) is a family-based study of sleep apnea, consisting of 2,284 individuals (46% African American) from 361 families studied on up to 4 occasions over a period of 16 years. The study began in 1990 with the initial aims of quantifying the familial aggregation of sleep apnea. National Institutes of Health (NIH) renewals provided expansion of the original cohort, including increased minority recruitment, and longitudinal follow-up, with the last exam occurring in February 2006. The CFS was designed to provide fundamental epidemiological data on risk factors for sleep disordered breathing (SDB). The sample was selected by identifying affected probands who had laboratory diagnosed obstructive sleep apnea. All first-degree relatives, spouses and available second-degree relatives of affected probands were studied. In addition, during the first 5 study years, neighborhood control families were identified through a neighborhood proband, and his/her spouses and first-degree relatives. Each exam, occurring at approximately 4-year intervals, included new enrollment as well as follow up exams for previously enrolled subjects. For the first three visits, data, including an overnight sleep study, were collected in participants' homes while the last visit occurred in a general clinical research center (GCRC). Phenotypic characterization of the entire cohort included overnight sleep apnea studies, blood pressure, spirometry, anthropometry and questionnaires. Currently, data of 710 individuals are available for use through BioData Catalyst (with genotype data available through dbGaP).The National Sleep Research Resource (NSRR) is a NIH-supported sleep data repository that offers free access to large collections of de-identified physiological signals and related clinical data from a large range of cohort studies, clinical trials and other data sources from children and adults, including healthy individuals from the community and individuals with known sleep or other health disorders. The goals of NSRR are to facilitate rigorous research that requires access to large or more diverse data sets, including raw physiological signals, to promote a better understanding of risk factors for sleep and circadian disorders and/or the impact of sleep disturbances on health-related outcomes. Data from over 15 data sources and more than 40,000 individual sleep studies, many linked to dozens if not hundreds of clinical data elements, are available (as of Feb. 2022). Query tools are available to identify variables of interest, and their meta-data and provenance.
Study
phs002715
Single-cell RNA sequencing of 16 NPM1 mutated AML patients
NPM1 mutated AML is the largest class in WHO classification, however several subtypes within NPM1 were already reported. Here, we sequence 16 NPM1 mutated AML samples with single-cell RNA-sequencing and multi-color flow cytometry from three major NPM1 subtypes we previously identified, totaling more than half a million cells. We show that the differences among three NPM1 subtypes are mainly driven by cell composition, and these differences are also reflected at cell surface protein level. With the re-analysis of CITE-seq data from another cohort and single cell variant calling on our data, we also report two distinct stem cell phenotypes: one with leukemic (CD34dimCD38-CD45RA+) and another with pre-leukemic (CD34+CD38-CD45RA-) origin. We believe that our data and findings provide novel insights regarding the etiology of NPM1 mutated AML.
Study
EGAS50000000332
National Heart, Lung, and Blood Institute (NHLBI) Bench to Bassinet Program: The Gabriella Miller Kids First Pediatric Research Program of the Pediatric Cardiac Genetics Consortium (PCGC)
Multi-center, prospective observational cohort study of individuals with congenital heart defects (CHD). Phenotypic data and source DNA derived from 300 probands and their parents, and family of interest collected to investigate relationships between genetic factors and phenotypic and clinical outcomes in patients with CHD. This Kids First project represents a subset of the PCGC data. Other PCGC data can be accessed through phs001194 and other Kids First data can be accessed through kidsfirstdrc.org.
Study
phs001138
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Mediators of Atherosclerosis in South Asians Living in America Study (MASALA)
Cohort Description The Mediators of Atherosclerosis in South Asians Living in America (MASALA) study is a prospective cohort of South Asians that aims to identify risk factors for heart disease in a large, growing Asian American subgroup. MASALA enrolled 906 South Asians in 2010-2013 and then added a new wave of 258 South Asian participants from 2017-2018, for a full cohort size of 1,164. Data Being Submitted Wave 1 questionnaire data includes 3967 variables for up to 431 MASALA participants in C4R. Wave 2 questionnaire data includes 448 variables for up to 313 MASALA participants in C4R. Dried Blood Spot/Serosurvey data includes 7 variables for up to 248 MASALA participants in C4R. Derived data includes 43 variables for up to 528 MASALA participants in C4R. Phenotype data includes 113 variables for up to 429 MASALA participants in C4R.
Study
phs002980
TMM whole genome analysis of 4566 Japanese individuals
Tohoku University Tohoku Medical Megabank Organization (ToMMo) and Iwate Tohoku Medical Megabank Organization (IMM) were founded to establish an advanced medical system to foster the reconstruction from the Great East Japan Earthquake. These organizations are developing a biobank that includes medical and genome information for supporting health and welfare in the Tohoku area. In the first stage, the part of our mission was to sequence the 4,000 individuals to construct Japanese whole-genome reference panel.
Study
JGAS000239
Assessment of cannabidiol and Δ9-tetrahydrocannabiol in mouse models of medulloblastoma
Phytocannabinoids Δ9-tetrahydrocannabinol (THC) and cannabidiol (CBD) have been demonstrated to exhibit anti-cancer activity in preclinical models of brain cancer leading to new clinical trials for adults with glioblastoma. We describe here the first report that has investigated a role for THC and CBD in paediatric brain cancer. Cannabinoids had cytotoxic activity against medulloblastoma and ependymoma cells in vitro, functioning in part through the inhibition of cell cycle progression and the induction of autophagy. Despite these effects in vitro, when tested in orthotopic mouse models of medulloblastoma or ependymoma, no impact on animal survival was observed. Furthermore, cannabinoids neither enhanced nor impaired conventional chemotherapy in a medulloblastoma mouse model. These data show that while THC and CBD do have some effects on medulloblastoma and ependymoma cells, are well tolerated and have minimal adverse effects, they do not appear to elicit any survival benefit in preclinical models of paediatric brain cancer.
Study
EGAS00001004963
Disorders/Differences of Sex Development (DSD) Study Performed at UCLA in Collaboration with the DSD-Translational Research Network (DSD-TRN), with the Support of the Gabriella Miller Kids First Pediatric Research Program
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). Both childhood cancers and structural birth defects are critical and costly conditions associated with substantial morbidity and mortality. Elucidating the underlying genetic etiology of these diseases has the potential to profoundly improve preventative measures, diagnostics, and therapeutic interventions. Whole Genome Sequence (WGS) and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed. Disorders/Differences of Sex Development (DSD) are congenital conditions in which development of chromosomal, gonadal, or anatomic sex is atypical. DSD are chronic medical conditions collectively affecting ~1% of the population, frequently requiring life-long care by multiple specialists, and carrying a significant public health burden. Some are associated with life-threatening events, such as adrenal crises in Congenital Adrenal Hyperplasia (CAH). DSD are also associated with increased infertility, cancer, gender dysphoria risks, psychosocial distress and pervasive challenges to health-related quality of life (HRQoL) for patients and families. DSD are broadly classified into three categories: sex chromosome DSD, 46,XY DSD and 46,XX DSD, and further classified according to the type of gonad found in the patient (ovary, testis, ovotestis). We were able to increase significantly the diagnostic success for DSD using Whole Exome Sequencing (WES), with the identification of disease-causing and likely pathogenic variants in a third of a cohort of 46,XY patients. We have therefore proposed a shift in the diagnostic approach to DSD to use next-gen sequencing as a first-line clinical test, which could lead to faster and more accurate diagnosis, and orient further clinical management, limiting unnecessary, costly, and often invasive endocrine testing and imaging. However many remain unexplained (over half of the XY cases, a significant minority of XX cases, including most ovotesticular DSD, and most syndromic cases). In addition, the very large phenotypic variability in cases with known variants in the same gene is unexplained. We here propose to use Whole-Genome Sequencing (WGS), which dramatically improves upon exome sequencing, covering both coding and non-coding parts of the genome more uniformly, as an approach to not only improve diagnostic yield, but also to identify novel genes and regulatory elements involved in DSD.
Study
phs001178
Sporadic ALS Australia Systems Genomics Consortium (SALSA-SGC)
In 2016 we established the Sporadic ALS Australia Systems Genomics Consortium (SALSA-SGC) funded by the Ice Bucket Challenge Grant administered by the Motor Neurone Disease Research Institute of Australia. The goals of the SALSA-SGC are to collect biological samples from clinics across Australia with matched in depth clinical and self-report phenotypes and to generate multiple levels of genetic and genomic data. In this first data generation exercise of the SALSA-SGC the majority of the samples were collected prior to the formal establishment of SALSA-SGC from clinics across Australia.Briefly, the cohort includes the University of Sydney’s Australian Motor Neuron Disease DNA Bank (MND Bank) cohort recruited April 2000 to June 2011), with study protocol approved by the Sydney South West Area Health Service Human Research Ethics Committee (HREC). Cases were recruited from around Australia via state-based MND associations with diagnosis verified by a neurologist. The remainder of the cases were recruited from clinics across Australia between 2015 and 2017 under HREC approvals from Royal Brisbane and Women’s Hospital, Macquarie University Multidisciplinary Motor Neurone Disease Clinic, Calvary Health Care Bethlehem in Melbourne , Fiona Stanley Hospital in Perth, and from 2016 under HREC approvals at each site for the sporadic ALS Australia Systems Genomics Consortium (SALSA-SGC). The ALS cases were diagnosed with definite or probable ALS according to the revised El Escorial criteria. Some controls were recruited as either partners or friends of patients, healthy individuals free of neuromuscular diseases. We are providing GWAS and MWAS data in this dataset. Individual level GWAS data were generated using Illumina Infinium CoreExome-24 version 1.1 chips for N= 846 cases and N=665 controls. Individual MWAS data was generated using the Illumina Human methylation 450K array for N=782 cases and N=613 controls. There 1315 individuals where GWAS and MWAS data has been generated and is available. Further information on these data sets can be found: Paper 1: Restuadi, R, Garton, FC, Benyamin, B, Lin, T, et al. Amyotrophic Lateral Sclerosis Genetic Correlation with Cognitive Performance, educational attainment and schizophrenia: evidence from polygenic risk score analysis. (submitted) Paper 2: Nabais, MF, Lin, T, Benyamin, B et al. Significant out-of-sample classification from methylation profile scoring for amyotrophic lateral sclerosis. 2020. NPJ genomic medicine. 5(10). Files provided in this submission include: GWAS: This folder contains QCed genotype for the Australian ALS case-control cohort. Contains PLINK files for genotyping data (not imputed yet). The individuals selected here have: good consistency on phenotype data ethics approval registered as part of sporadic ALS studies unrelated by GRM cut-off 0.05 No ancestry QC yet MWAS: This folder contains the IDAT and post-QC normalized DNAm (beta) for the Australian ALS case-control cohort. 2019_AUS_ALS_PCTG_DNAm.tar.gz - IDATS for 1315 individuals analyzed in the MWAS study normalized_beta_values - Binary files (created with the OSCA software) containing information on the individuals, probes and the DNAm (beta) values obtained after QC phenotype_file - contains all the covariates analysed in the MWAS including: case-control status, coded 0 = Control and 1 = ALS, predicted age, predicted cell-type proportions, predicted smoking scores, slide and chip position and sex Important Notes: The DNAm data were normalized together with samples that were not part of this ALS case/control study and thus, the normalization procedure may not be 100% reproducible using only the IDAT files uploaded here. Summary data has been made publicly available and can be accessed directly: Data collection and sample processing were performed at several clinics across Australia. Genotyping and DNA methylation arrays were performed by the Human Studies Unit, at the Institute for Molecular Bioscience (University of Queensland). Quality control of the genotypic, phenotypic and DNA methylation data was done by the Program of Complex Traits Genomics, at the Institute for Molecular Bioscience (University of Queensland).
Study
phs002068
Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) - OncoArray Genotypes
Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) was one of five projects funded in 2010 as part of the NCI's Genetic Associations and Mechanisms in Oncology (GAME-ON) initiative (http://epi.grants.cancer.gov/gameon/). GAME-ON's overall goal was to foster an intra-disciplinary and collaborative approach to the translation of promising research leads deriving from the initial wave of cancer GWAS. Specific goals included replication of previous GWAS findings and identification of new susceptibility loci through meta analyses of existing GWAS data and fine mapping of identified loci to better pinpoint causal variants; and identify germline variants that are associated with risk of multiple cancers. The other four funded GAME-ON projects were: the ColoRectal TransdisciplinaryStudy (CORECT), Elucidating Loci Involved in Prostate Cancer Susceptibility (ELLIPSE), Follow-up of Ovarian Cancer Genetic Association and Interaction Study (FOCI), and Transdisciplinary Research in Cancer of the Lung (TRICL). To identify additional cancer risk loci, improve the precision of fine-mapping, and facilitate cross-cancer analyses, the GAME-ON projects and other consortia formed the OncoArray network (http://epi.grants.cancer.gov/oncoarray/), which developed and genotyped a new custom genotyping array (the "OncoArray") in large numbers of cancer cases and controls (over 400,000 samples). The OncoArray is a custom array manufactured by Illumina. The array includes a backbone of approximately 260,000 SNPs that provide genome-wide coverage of most common variants, together with markers of interest for each of the five GAME-ON cancers identified through genome-wide association studies (GWAS), fine-mapping of known susceptibility regions, sequencing studies, and other approaches. The array also includes loci of interest identified through studies of other cancer types, and other loci of interest to multiple cancer types (including loci associated with cancer related phenotypes, drug metabolism and radiation response). Additionally, SNPs relating to quantitative phenotypes such as body mass index (BMI), height, and breast density that correlate with common cancer risks are also included. The DRIVE data included under this dbGAP submission include OncoArray data from 60,015 breast cancer cases and controls genotyped at the Center for Inherited Disease Research (CIDR), University of Cambridge, National Cancer Institute, University of Copenhagen, University of Southern California and Mayo Clinic. Details on an additional approx. 80,000 breast cancer cases and controls genotyped at other centers can be found at http://bcac.ccge.medschl.cam.ac.uk/bcacdata/oncoarray/.
Study
phs001265
Genome-wide association analysis on five isolated populations - Erasmus Rucphen Family (ERF) Study
ERF is embedded in the Genetic Research in Isolated Population (GRIP) program. Genealogy of GRIP is available from its founding (1600-1650). For ERF, we selected 22 related (first and second degree, non-consanguineous) couples living in GRIP between 1850-1900, and having at least six children baptized in the community church. All living descendants of those couples were invited to participate in ERF1. About 4,000 relatives included in the study form a very complex pedigree, which consists of more than 23,000 members. The participants aged between 18-20 years constitute a unique single family spanning over 23 generations. Around 3,000 individuals have been extensively phenotyped for several psychiatric and somatic traits. All phenotyped individuals are genotyped on genome-wide genotyping arrays.
Study
EGAS00001001134
Development of a fully human glioblastoma-in-brain-spheroid model for accelerated translational research
In glioblastoma (GBM) the intricate interplay between tumor cells and the surrounding microenvironment plays a crucial role in tumor progression, invasion, and therapeutic resistance. So far, studying these interactions in a controlled and representative model system has been challenging. Here, we report the development of hGliCS, a human glioma-cortical spheroid model that allows the elucidation of the biology of GBM cells and their interactions with a human-specific brain-like microenvironment and neurons. GBM cells efficiently invade the cortical spheroids, forming a well-connected network of communicating cells. The heterogeneous cellular states of the GBM cells within this model closely resembled findings previously observed in glioblastoma patients and in mouse xenografts. We observed a transition from predominantly mesenchymal-like cells to heterotopic states with a high diversity. In contrast to the substantial changes observed in the tumor cell population, the impact of the GBM cells on the neurons was minimal. We further demonstrate the suitability of hGliCS to test compounds targeting tumor-specific neurobiological features.
Study
EGAS50000000757
University of Utah Pelvic Organ Prolapse Disorder Study
The overall purpose of this University of Utah Pelvic Organ Prolapse Disorder Study was to identify and localize predisposition genes contributing to pelvic organ prolapse (POP). POP cases recruited for this study were identified by one of three methods: high-risk POP pedigree cases, POP sister pairs, and surgically-treated POP cases reporting a family history of POP. High-risk POP pedigree cases were identified using the Utah Population Database (UPDB), a genealogy database of residents in Utah that has been linked to diagnostic ICD9 and CPT codes in medical records at the University of Utah and Intermountain Healthcare. We identified families with a significant excess number of POP cases compared to matched population rates and targeted these individuals for recruitment as well as any other POP cases in the family. POP sister pair cases were identified at the University of Utah Urogyncology clinic for women who had undergone POP surgery and also self-reported one or more sisters who were also surgically treated for POP. POP affection status of all sisters was confirmed either by physical examination or by chart review. Surgically treated-POP cases reporting a family history of POP were identified at the University of Utah Urogyncology clinic by self-report of a family history of POP. Efforts were made to recruit other affected family members and confirm affection status. To obtain DNA, subjects provided either a blood specimen or saliva. Medical records were reviewed by a urogynecologist and diagnostic information for pelvic organ prolapse and stress and overactive bladder were obtained. Collected DNA was genotyped and analyzed. To maintain confidentiality of the familial data, genetic data from only one subject per family has been submitted to dbGaP. Use of the University of Utah Pelvic Organ Prolapse Disorder Study data is limited to investigators studying pelvic floor disorders. These pelvic floor disorders include pelvic organ prolapse, urinary and anal incontinence, and other conditions related to weakening or injury to the muscles and connective tissue in the pelvis as a result of pelvic surgery, pregnancy, or vaginal delivery of a child. These data will be used only for research purposes related to pelvic floor disorders. They will not be used to determine the individual identity of any person or their relationship to another person or for research on non-disease traits.
Study
phs001439
University Clinic Golnik DAC
DAC of the University Clinic Golnik, that will decide regarding the requests for data submitted on behalf of research groups of the Clinic
Dac
EGAC50000000820
The NCAA-DoD Concussion Assessment, Research, and Education (CARE) Consortium
Study Description: Established in 2014, the U.S. Department of Defense (DoD) and the National Collegiate Athletic Association (NCAA) funded the Concussion Assessment, Research and Education (CARE) Consortium to inform science, clinical care and public policy related to concussion and repetitive head impact exposure (HIE) in U.S. Military Service Academy (MSA) cadets and collegiate student-athletes. The primary aims of the initial study were to: (1) establish a multisite research consortium to characterize the effects of concussion and repetitive head impact exposure on brain health, (2) characterize the clinical sequelae and natural history of concussion, (3) characterize the effect of concussion and repetitive head impacts on brain structure and brain function. The 30 member institutions of the CARE Consortium (26 civilian universities/colleges and 4 military service academies) agreed to invite all varsity athletes to undergo multimodal clinical assessments at preseason baseline, and at 5 additional timepoints after concussion diagnosis. At the service academies, non-varsity athlete cadets/midshipmen were also studied. Assessment Categories At Each Time Point Baseline: • Demographics • Personal and Family History • Neurocognitive Assessment • Neurological Status • Postural Stability • Symptoms ACUTE POST-INJURY FOLLOW-UP (CARE 1.0) (24 hrs; 48hrs; Asymptomatic; Unrestricted Return to Play (RTP); 6 mos) • Neurocognitive Assessment • Neurological Status • Postural Stability • Symptoms CUMULATIVE & PERSISTENT EFFECTS. (CARE 2.0) In-Person Exit Visit; Post-grad Online • Demographics • Personal and Family History • Neurocognitive Assessment • Neurological Status • Postural Stability • Symptoms • PROs • MROs (at academies only) LONG-TERM EFFECTS (CARE/SALTOS Integrated (CSI))Annual Online Assessment (at MSAs only); In-Person Research Visits • Demographics • Symptoms • Psychological health • PROs • MROs (at academies only)A subset of athletes at 6 of the CARE institutions underwent additional assessments including multimodal MRI, proteomic and genomic biomarker characterization, and head impact quantification (helmet-based sensors). Assessment time points for the ARC/pARC Neurocognitive and Behavioral Testing was done at Baseline, <48hrs Post-Injury, Cleared for Return to Play Progression (Asymptomatic), Unrestricted Return to Play, 7 days following Return to Play, and at >60 days after final game of junior year until end of collegiate career.Blood biomarker collection was done at Baseline, <48hrs Post-Injury, Cleared for Return to Play Progression (Asymptomatic), 7 days following Return to Play, and at >60 days after final game of junior year until end of collegiate career. Multi-modal MRI studies were done at Baseline, <48hrs Post-Injury, Cleared for Return to Play Progression (Asymptomatic), 7 days following Return to Play, and at >60 days after final game of junior year until end of collegiate career. However, please note that only athletes from one study performance site completed a multi-modal MRI study at baseline.
Study
phs002175
Demographically Diverse Substance Use Disorder Cohorts of Dr. Stanley H. Weiss
The Demographically Diverse Substance Use Disorder Cohorts of Dr. Stanley H. Weiss, which constitute the Epidemiology of the Weiss Cohort Projects, consist of a series of inter-connected projects, building upon a set of cohort projects of various groups, mainly drug users from medication-assisted treatment programs, that Dr. Stanley H. Weiss first developed in the 1980’s plus several newer initiatives, each with an array of collaborators.
Beginning in the 1980’s, Dr. Stanley H. Weiss started several long-term studies of persons who inject drugs (PWID) across the United States, ultimately enrolling over 10,000 participants through the early 1990’s with an average age then in their 30’s. About a quarter were enrolled from sites in New Jersey (NJ). These studies included the first testing of PWID for the human immunodeficiency virus (HIV) and the human T-cell lymphotropic viruses (HTLV I and HTLV II). Cumulative past support (initiation thru ~ 1999) for these cohort studies included ~ $20 million from intramural resources from the National Cancer Institute (NCI) and the National Institute on Drug Abuse (NIDA), plus multiple grants and in-kind support from the New Jersey Department of Health (NJDOH) totaling ~ $1 million.
The Weiss Cohort Projects include the first large AIDS-era cohorts to include women at high risk for HIV. A high percentage of subjects in these studies are black or Latino. Thus, this is an ethnically diverse US cohort, with a high proportion of women included. These subjects are at high risk of parenteral and sexual infection from both drug use and sexual practices. Samples from other studies conducted by Dr. Weiss, in which detailed interviews were conducted, are included as controls (persons documented by us not to have a history of opioid drug use). As one of our groups of subjects have many persons of Haitian ancestry, we specifically included some Haitians who had never used opioids as controls. Our documentation includes such ancestry.
These cohorts demonstrated high rates of HIV and HTLV-II infection in PWIDs, including one study initiated in 1981 with confirmation in the later cohorts. In the first two decades of these studies, among numerous publications was the first study showing a very high rate of hepatitis C infection among PWIDs. An example of how the studies’ long-time horizon proved essential was that it first became possible to test whether a person had ever been infected with hepatitis C virus (HCV), as well as how much HCV was in each person’s blood, many years after the specimens were collected. This allowed HCV amounts in blood to be compared for subjects who had died of liver disease early in the study versus those who survived. Then a sequence of published papers culminated in demonstrating, using a nested case-control design, that a high baseline HCV titer was predictive of early progression to death from end-stage liver failure. Outcomes related to HCV (end stage liver disease and hepatocellular carcinoma) remain under study.
In the original cohort studies, the mean age at enrollment was ~ 33 years old, so that those still alive in 2022 are mainly now ~ 60 - 75 years old. Many participants have already died. The tincture of time has led to subjects reaching ages when many more are dying from a wide array of outcomes, including from many chronic diseases (including cancer) as well as from infectious agents (especially HIV, HCV) or drug overdose.
Renewed collaboration with local drug treatment programs has led to new field-based studies, including examination of some currently evolving problems among drug users. Dr. Weiss joined the National Institute on Drug Abuse (NIDA) Genetics Consortium (NGC) in 2017, and through the NIDA project officer has had access to NGC contract resources (see below).
NIH Certificate of Confidentiality, CC-DA-16-214 (attached) protects these studies. Past arrangements related to data on our subjects leads to restrictions on the use of data emanating from our study, such as potential commercialization and restrictions on whom may access and use these data. NIDA Genetics Consortium (NGC) resources further support these endeavors and will be used as part of the NGC analyses studying the genetics of substance use.
Study participants signed informed consent for the information collected from them to be used with no time limit and for biologic specimens collected from them to be used without restriction in future research. Serum samples were collected from participants, and from many also plasma, white blood cells and/or urine samples. About 100,000 vials were stored. All specimens have been continuously preserved at sufficiently cold temperatures to prevent deterioration, and many subjects separated white blood cells were processed and frozen in such a way as to maintain viability. Detailed data from the participants has been accumulated over time, and in general, linkage has been retained in each sub-study in accordance with the consent forms and protocols. For some participants, specimens were collected at multiple times (that is, sequential specimens). Multiple specimens from a single person exist in this database, and efforts at de-duplication remain ongoing. Dr. Weiss should be contacted if an investigator requires unique individuals since:
• Multiple phases of enrollment occurred, and as our prospective follow-up continues; Dr. Weiss may identify new instances of multiple enrollment.
• Some persons are related to each other.
• In general, in this dataset for dbGaP, only a single specimen/record form a given person is included.
Advances in laboratory testing techniques now permit innovative new uses for our linked research biospecimen repository. The ongoing focus of an interdisciplinary research program based on these cohorts relates subjects’ diseases, behaviors, medical history, and outcomes with biological and exposure markers. Participants’ use of various substances was ascertained on study enrollments, many serially over time. Quantitative frequency of use data, also sometimes sequential over time, were ascertained. Active ascertainment of outcomes is being conducted, including matching to mortality and cancer databases. Investigators interested in collaborations on specific outcomes (which is not part of this dbGaP dataset) or in the use of our stored specimens are encouraged to contact the principal investigator, Dr. Weiss.
The processing of the genomic data was done in conjunction with NIDA, and in accordance with some longstanding data cleaning steps used by NIDA in the NIDA Genetics Consortium (NGC), a group to which we shall be contributing these data for collaborative analyses. Since there is the potential for these steps to introduce certain types of potential biases, we summarize these here.
Under contract from NIDA, cryopreserved sera or plasma (-80 C) or cells (in liquid nitrogen) were used, with most stored having been stored for 30 to 40 years in our biorepository. In the case of serum or plasma, in which only (largely) cell-free DNA fragments were available, DNA was extracted and restored prior to amplification. Industry standard DNA amplification techniques were done on all samples prior to genotyping in accord with established protocols of the NIDA Genetics Consortium.
Our genotype data were run and processed on the Illumina Infinium OmniExpress_v_1.3 array. This array has 714,238 SNPs, and was designed many years ago. There were 628 SNPs on the array that do not correspond to any chromosome position, and these were removed. Genotype data were submitted by NIDA’s contracted genotyping laboratory in six batches over time to NIDA’s contracted dbGaP data management group, which conducted quality control (QC) analyses. QC analysis included an assessment of batch effects on for five of the six batches. (One of the batches, with only 12 samples, was too small for QC analysis of batch effects.)
Standard NIDA Genetic Consortium cleaning was performed. Samples with a call rate <.85 were removed. Only one sample per person was retained. When more than one specimen was genotyped from one subject, only the sample with the higher call rate was retained (provided, of course, that that call rate was ≥ 0.85). We have retained some people we know are related, including some found to have been related through genotyping; the pedigree file describes those relationships.
In summary, key cleaning steps include:
1. Using PLINK to check gender discrepancy.
2. Using PREST-PLUS and KING (Kinship-based Inference for GWAS) to check relatedness.
3. Using PEDCHECK and PLINK to check/zero-out Mendelian error.
4. Using PLINK to perform sample QC, SNP QC, along with KING to perform chromosome X and chromosome Y QC.
5. SNP-QC: Batch-effect: 5 Batches were compared (one batch, with few samples, was not). These five batches were compared to each other in all ten possible pairs, one batch vs. another batch, examining SNP allele frequency discrepancies by population (from GRAF), Fisher Exact Allelic test, with the criterion of p<5e-8 for removal.
6. SNP-QC: discordant SNPs in QC duplicates. Compared 25 QC duplicated samples with call rate > 0.95, removed SNPs with 3+ discordance.
7. There were 1,056 SNPs that were monomorphic; these have been retained so they can be included in analyses in which our dbGaP data are combined with those from other cohorts (in the latter of which those SNPs may not be monomorphic).
The final cleaned dataset submitted has 8,898 samples and 606,793 SNPs.
Study
phs002140
A Case-Controlled Study for Genotype-Phenotype Associations in Multiple Sclerosis (MS)
This is a multi-centre, case-controlled study to develop a dataset containing 1000 MS cases and 1000 matched controls and to associate DNA sequence (allelic) variations with MS phenotypes. Study subjects were enrolled through a prospective effort initiated in 2003. Three MS clinical centres were involved in subject recruitment and biological specimen collection using identical inclusion/exclusion criteria, two in Europe (Vrije Universiteit Medical Center, Amsterdam; and University Hospital Basel) and one in the US (University of California San Francisco). This study recruited subjects of northern-European ancestry with a diagnosis of MS (McDonald et al., 2001), with dissemination in time and space. Patients with Clinically Isolated Syndromes (CIS) were also included if they fulfilled 3 of the 4 Barkhof criteria for dissemination in space as per application of the McDonald criteria (McDonald et al., 2001). While recruitment predominantly included subjects with a relapsing onset of MS, individuals with all clinical subtypes of the disease participated, including clinically isolated syndrome (CIS), relapsing remitting MS (RRMS), secondary progressive MS (SPMS), primary progressive MS (PPMS), and progressive relapsing MS (PRMS). The control group consisted of unrelated individuals, primarily spouses/partners, friends, and other volunteers. Control subjects were of northern-European ancestry and matched as a group, proportionally with cases according to age (±5 years) and gender. A familial history or current diagnosis of MS as well as a relation to another case or control subject were considered exclusionary for this group. Protocols were approved by the Committees on Human Research at all Institutions and informed consent was obtained from all participants prior to participation in the study. Primary Study Objective:To identify DNA sequence variations (genotype) and flanking sequences that are associated with clinical factors (phenotype) which differ between study subjects with and without MS. Secondary Study Objectives: To develop a clinical dataset including quantitative measures of 1000 well-characterized cases with MS, and 1000 ethnically matched controls. To identify other genotype-phenotype associations in MS study subjects such as magnetic resonance imaging (MRI) measures of disease burden and/or severity. To identify or confirm candidate surrogate markers of neurodegeneration using a variety of techniques including biochemical assays, blood transcriptome analysis, plasma proteomics and MRI*. GenotypingGenotyping of the complete dataset was performed at the Illumina facilities using the Sentrix® HumanHap550 BeadChip. *MRI results are not available on dbGaP.
Study
phs000171
NHLBI TOPMed: Severe Asthma Research Program (SARP)
The overall goal of the Severe Asthma Research Program (SARP) is to identify and characterize subjects with severe asthma to understand pathophysiologic mechanisms in severe asthma. Subjects with mild and moderate asthma were recruited for comparison but the program was enriched for subjects with severe asthma from multiple centers. Subjects were comprehensively phenotyped for asthma related traits including lung function, atopy, questionnaires on medical and family history, exhaled nitric oxide and health care utilization including exacerbations and symptoms. Asthma is a heterogenous disease. Cluster analysis in SARP has shown multiple subphenotypes and endotypes.
Study
phs001446
Disease specific alterations in the olfactory mucosa of patients with Alzheimer’s disease
OOlfactory dysfunction manifests early in several neurodegenerative disorders. Olfaction is orchestrated by olfactory mucosal cells located in the upper nasal cavity. However, it is unclear how this tissue reflects key neurodegenerative features in Alzheimer’s disease. Here we report that Alzheimer´s disease olfactory mucosal cells secrete toxic amyloid-beta. We detail cell-type-specific gene expression patterns, unveiling 147 differentially expressed disease-associated genes compared to the cognitively healthy controls, and 5 distinct populations in globose basal cell -, myofibroblast-, and fibroblast/ stromal – like cells in vitro. Overall, coordinated alteration of RNA and protein metabolism, inflammatory processes and signal transduction were observed in multiple cell populations, suggesting a key role in pathophysiology. Our results demonstrate the potential of olfactory cell cultures in modelling Alzheimer´s disease. Moreover, for the first time we provide single cell data on olfactory mucosa in Alzheimer´s disease for investigating molecular and cellular mechanisms associated with the disease.
Study
EGAS00001006019
The Human Pancreas Analysis Program (HPAP)
The past decade has seen a dramatic improvement in our ability to phenotype and molecularly profile human tissues with unprecedented resolution at the genomic, epigenomic, protein, and functional levels. The Human Pancreas Analysis Program (HPAP), part of the Human Islet Research Network and supported by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) through multiple NIH grants, is performing deep phenotyping of the human endocrine pancreas to better understand the cellular and molecular events that precede and lead to beta-cell loss and/or dysfunction in type 1 diabetes (T1D) and type 2 diabetes (T2D) as well as to accumulate, analyze, and distribute high-value data sets to the diabetes research community through the HPAP PANC-DB database (and additional details provided at that site). To this end, HPAP employs state-of-the-art technologies to perform comprehensive analyses of pancreas biology as it pertains to organ donors with T1D, autoantibody-positive donors without diabetes, donors with T2D, and control donors. Pancreas procurement and analyses take advantage of the expertise and extensive network of Organ Procurement Organizations (OPOs) and autoantibody screening centers established through the JDRF/The Lenona M. and Harry B. Helmsley Charitable Trust (HCT) – funded nPOD program. In contrast to nPOD, the major product provided by HPAP is not archived biomaterial subject to broad distribution, but rather, the delivery of extensive and high-quality molecular data sets to the diabetes research community in order to facilitate further discovery. Together, nPOD and HPAP are complementary programs that assist the diabetes community and afford the maximal opportunity for advancing knowledge about the pathogenesis of T1D and T2D.
Study
phs002465
Privacy Notice for Data Access Committee Account
Privacy Notice for EGA Data Access Committee Account
This Privacy Notice explains what personal data is collected by the specific service you are requesting, for what purposes, how it is processed, and how we keep it secure.
Note that this service collects personal data directly provided by the user, and also collects personal data from users that is provided by other organisations.
1. Who controls your personal data and how to contact us?
European Genome- Phenome Archive - EGA offers a service for permanent archiving and sharing of all types of personally identifiable genetic and phenotypic data resulting from biomedical research projects, jointly managed by European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI) and Fundació Centre de Regulació Genòmica - Centre for Genomic Regulation (CRG).
EMBL-EBI and CRG represent joint Data Controllers’ of processing of your personal data. They and their Data protection officers may be contacted for data protection queries and for exercising your rights under Section 8.
You may contact EMBL-EBI, represented by Mallory Freeberg, by:
email at mfreeberg@ebi.ac.uk , orpost at EMBL-EBI, Wellcome Genome Campus, CB10 1SD Hinxton, Cambridgeshire, UK.
EMBL’s Data Protection Officer may be contacted by:
email at dpo@embl.org, orpost at EMBL Heidelberg, Data protection officer, Meyerhofstraße 1, 69117 Heidelberg, Germany.
You may contact CRG, whose EGA team is represented by dr. Jordi Rambla de Argila, by:
email at jordi.rambla@crg.eu, orpost at Fundació Centre de Regulació Genòmica - Centre for Genomic Regulation (CRG), Dr.Aiguader 88, PRBB Building, 08003 Barcelona, Spain.
CRG Data protection officer may be contacted by:
email at dpo@crg.eupost at Fundació Centre de Regulació Genòmica - Centre for Genomic Regulation (CRG), C/ Dr. Aiguader, 88, PRBB Building, 08003 Barcelona, Spain.
2. Which is the lawful basis for processing personal data?
We process your personal data on the grounds of important public interest.
For monitoring your activities on the website, we process your personal data on the grounds of important public interest. Such legal basis is found in Article 5(1)(a) of EMBL Internal Policy No 68 on General Data Protection (hereinafter IP 68), which is equivalent to Article 6 (1)(e) of the EU General Data Protection Regulation (hereinafter GDPR) and upon which personal data are processed for the achievement of the aims laid down in 1973 agreement establishing EMBL, such as the promotion of the cooperation in the fundamental research, in the development of advanced instrumentation and in advanced teaching in molecular biology and dissemination of information.
3. What personal data is collected from users of the service? How do we use this personal data?
We collect the following personal data from you:
NameEmail addressTitle/PositionOrganisationOrganisational affiliationBusiness addressTelephone numberIP addressesDate and time of a visit to the service websiteOperating systemAmount of data transmittedBrowserUsernamePassword
The data controller will use your personal data for the following purposes:
To provide DAC user account and authenticated access to the service,To publicly publish some information to facilitate scientific research,To better understand the needs of the data subjects and guide future improvements of the service,To create anonymous usage statistics (from number of DACs, datasets per DAC).
4. Who will have access to your personal data?
The personal data will be disclosed to:
Authorised staff in the data controller’s institutions acting on data controller`s behalf and instructions (for all user account data),The general public via Internet will get access to your name, email address, business address, telephone number and organisation you belong to.
5. Will your personal data be transferred to third countries (i.e. countries not part of EU/EEA) and/or international organisations?
Data categories ‘name`, `email address`, `telephone number`, `business address`; `organisation’ of the DAC user are published on the Internet. They thus become accessible to recipients in countries outside the European Economic Area. Insofar as the second joint controller may be subject to GDPR, data transfer to and from the first joint controller (EMBL-EBI), is necessary for important reasons of public interest embedded in the aims of EMBL and justified in the Article 9(4) of IP 68 (equivalent to Article 49(1)(d) of GDPR) read in conjunction with EMBL`s 1973 establishing agreement and Article 179(2) of the Treaty on the Functioning of the European Union
6. How long do we keep your personal data?
Any personal data directly obtained from you will be retained as long as the service is live. Such duration serves the purpose of enabling scientific research and ensures legal compliance and facilitates internal and external audits if they arise.
By contrast, the log files for the data categories related to anonymous usage statistics (raw web service logs) are processed only for 30 days and thereafter erased.
7. The joint Data Controllers provide these rights regarding your personal data
You have the right to:
Not be subject to decisions based solely on an automated processing of data (i.e. without human intervention) without you having your views taken into consideration.Request at reasonable intervals and without excessive delay or expense, information about the personal data processed about you. Under your request we will inform you in writing about, for example, the origin of the personal data or the preservation period.Request information to understand data processing activities when the results of these activities are applied to you.
It must be clarified that rights under points 4 and 5 are only available whenever you need support whilst using our website. For other processing based on the grounds of important public interest you cannot exercise your rights to object, rectify or erase your personal data according to the Article 13(2)(a)(b) of IP 68 (equivalent to Article 17(3)(b)(d) and Article 21(6) of the GDPR).
8. Supervisory authority
If you wish to complain against the processing of your personal data, you may do so by post at:
EMBL Heidelberg, Data Protection Committee, Meyerhofstraße 1, 69117 Heidelberg, Germany, or Autoritat Catalana de Protecció de Dades (Catalan Data Protection Authority), C/Rosselló 214, Esc A, 1r 1a, Barcelona 08008, Spain.
Published at: February 6, 2019
Documentation
data-protection/privacy-notice/ega-dac
NHLBI TOPMed: My Life Our Future (MLOF) Research Repository of Patients with Hemophilia A (Factor VIII Deficiency) or Hemophilia B (Factor IX Deficiency)
Hemophilia A and B are X-linked bleeding disorders resulting from a deficiency in coagulation factor VIII (FVIII) or factor IX (FIX), respectively. Hemophilia affects approximately 1/5000 male births worldwide, and results in premature death and disability due to bleeding if coagulation factor replacement therapy is not used effectively. Hemophilia is clinically categorized by coagulation factor activity levels and ranges in severity from mild (6% to 30%) to moderate (1-5%) to severe (<1%). Many female "carriers" of hemophilia also have decreased factor activity and morbidity from bleeding. Hemophilia A and B are almost always caused by identifiable mutations in the F8 and F9 genes, respectively, and these mutations are found throughout the structural genes. Although the hemophilias are monogenic disorders, there are wide variations in disease severity and therapeutic outcomes which are not readily explained by the disease causing mutations alone. The My Life Our Future (MLOF) project (www.mylifeourfuture.org) is a national resource developed by a partnership of BloodworksNW (BWNW, formerly the Puget Sound Blood Center), the American Thrombosis and Hemostasis Network (ATHN), the National Hemophilia Foundation (NHF) and Bioverativ, to provide free F8 and F9 gene variant analysis to patients with hemophilia A or B, and to establish a research repository of DNA sequence, DNA, RNA, buffy coat, serum and plasma. The sequence analysis and serum samples are linked to a phenotypic database hosted by ATHN, with samples submitted and clinical data entered at ~100 hemophilia treatment centers (HTCs) nationwide. (See ATHN Research Report Brief in the resource center at www.athn.org). MLOF has become the largest hemophilia genetic project worldwide. The roles of the MLOF partners are: BWNW, to serve as the central laboratory for the project and house the research repository; ATHN, to support and provide the administrative link with HTCs, to facilitate the collection of accurate phenotypic data, to conduct research review and approval for use of the repository and with BWNW to provide samples and data for research projects; NHF, to provide consumer education and facilitate consumer input into the project; and Bioverativ, to provide financial support and scientific input. The project is governed by a Steering Committee consisting of one representative from each organization. Subject samples chosen from the MLOF parent study for TOPMed and WGS were drawn from those who gave (or parents gave) informed consent for the Research Repository and included patients of all severities and type, but with an emphasis on those with severe hemophilia and others at increased risk of neutralizing antibody (inhibitor) formation and who had samples in the Research Repository (plasma, serum, RNA) for potential additional -omic studies. Also included were samples from subjects where a likely causative variant for hemophilia was not found in the F8 or F9 coding region, intron-exon boundaries or immediate upstream and downstream regions. Since hemophilia is an X-linked disorder, the majority of subjects are male. Racial distribution is similar to the overall population distribution.
Study
phs001515
Genetic Susceptibility and Biomarkers of Platinum-Related Toxicities
The Platinum Study is an R01-funded, multicenter study of testicular cancer survivors that characterizes both long-term cisplatin-induced peripheral neuropathy (CisIPN) and cisplatin-associated ototoxicity, i.e., permanent, bilateral hearing loss. We collected data abstracted from medical records (e.g., cumulative cisplatin dose), conducted audiometric examinations, and administered self-report questionnaires, which included CisIPN symptoms.
Study
phs001621
Case Report: Rare IKZF1 gene fusions identified in neonate with congenital KMT2A-rearranged Acute Lymphoblastic Leukemia
Here we describe a rare case of congenital KMT2Ar ALL presenting with co-occurring IKZF1 gene fusions and a predictably aggressive disease trajectory. We report for the first time, the novel IKZF1::TUT1 and KDM2A::IKZF1 gene fusions. Rearrangements in-volving KMT2A are commonly retained in relapsed infant ALL, however, in this case the KMT2A::AFF1 gene fusion did not appear to be the lesion driving leukemic relapse. Instead, our data suggest that relapse was driven by IKZF1::TUT1. This gene fusion remained in all samples investigated, including the on-blinatumomab therapy sample taken immediately prior to relapse. Conversely, the KMT2A::AFF1 gene fusion was only detected in the diagnosis and refractory post-induction samples highlighting a key role for IKZF1::TUT1 in disease pathogenesis. Intriguingly, both IKZF1 gene fusions are predicted to be out-of-frame, however, our data demonstrate the IKZF1 gene is still expressed. This is not unprecedented and it has previously been observed that out-of-frame fusions can cause transcriptional activation/repression of genes involved in the fusions leading to increases or decreases of their expression and the as-sociated functional outcomes.
Study
EGAS00001006947
High-Risk Breast Cancer GWAS
The GWAS includes High Risk Women from the following epidemiological studies of breast cancer, comprising a total of 3,719 cases and 3,642 controls (cases/controls: MEC, 0/200; ABCFR, 326/418; FCCC, 56/3; BCFR-UT, 66/32; CNIO-BC, 87/92; GESBC, 65/0; LIFE, 164/0; MARIE, 41/105; MAYO, 208/210; MNYR, 293/409; MSKCC, 310/0; NC-BCFR, 234/233; OFBCR, 553/560; POSH, 377/0; HBOC, 47/47; BBCS, 612/1333; UPENN, 280/0 This study was funded by a grant CA165038 to Christopher Haiman (University of Southern California) and John Hopper (University of Melbourne) from the National Cancer Institute, National Institute of Health. The contributing studies: Multiethnic Cohort (MEC). This study was supported by grant UM1 CA164973 from the National Cancer Institute, National Institute of Health. Ontario Familial Breast Cancer Registry, the Ontario site of the Breast Cancer Family Registry Cohort (OFBCR). This study was supported by grant UM1 CA164920 from the National Cancer Institute. Utah Breast Cancer Family Registry (BCFR-UT). This study was supported by grant UM1 CA164920 from the National Cancer Institute. New York site of the Breast Cancer Family Registry (MNYR). This study was supported by grant UM1 CA164920 from the National Cancer Institute. Northern California site of the Breast Cancer Family Registry (NC-BCFR). This study was supported by grant UM1 CA164920 from the National Cancer Institute. Australian Breast Cancer Family Registry (ABCFR). This study was supported by grant UM1 CA164920 from the National Cancer Institute. Breast Cancer Study (CNIO-BC). This study has been partially funded by The Spanish Network on Rare Diseases (CIBERER) and the Spanish National Genotyping Center (CEGEN). Genetic Epidemiologic Study of Breast Cancer (GESBC). The GESBC was supported by the Deutsche Krebshilfe e. V. [70492] and German Cancer Research Center (DKFZ). Mammary Carcinoma Risk Factor Study (MARIE). This study was supported by the Deutsche Krebshilfe e.V. [70-2892-BR I, 106332, 108253, 108419], the Hamburg Cancer Society, the German Cancer Research Center (DKFZ) and the Federal Ministry of Education and Research (BMBF) Germany [01KH0402]. Prospective study of Outcomes in Sporadic versus Hereditary breast cancer (POSH). Funding for the POSH study was provided by Cancer Research UK (grant refs A7572, A11699, C22524), the Breast Cancer Campaign (grant number: 2013MayPR044) and from 2003-2006 by a grant from The Wessex Cancer Trust. Prospective study of Outcomes in Sporadic versus Hereditary breast cancer (POSH). Funding for the POSH study was provided by Cancer Research UK (grant refs A7572, A11699, C22524), the Breast Cancer Campaign (grant number: 2013MayPR044) and from 2003-2006 by a grant from The Wessex Cancer Trust. Hereditary Breast and ovarian Cancer: Genetic and Molecular Studies (HBOC). This study was supported by National Cancer Institute grant CA58860 and The Lon V Smith Foundation: LVSF-44528. Mayo Clinic inherited breast and ovarian cancer study (MAYO). This study was supported by the Breast Cancer Research Foundation, NIH grants CA192393, CA176785, and an NIH CA116201 Specialized Program of Research Excellence (SPORE) in Breast Cancer. British Breast Cancer Study (BBCS); Mammographic oestrogens and growth factor study (MOG). The BBCS and the MOG study are funded by Cancer Research UK and Breakthrough Breast Cancer and acknowledge NHS funding to the NIHR Biomedical Research Centre, and the National Cancer Research Network (NCRN). Genotyping of non-BRCA1/2 mutation carriers (UPENN). The study is supported by the Basser Research Center at the University of Pennsylvania, Rooney Family Foundation, NIH grants CA176785 and CA192393, the Breast Cancer Research Foundation, the Susan G. Komen Foundation for the Cure and Macdonald Family Foundation. Clinical Significance of Germline BRCA Mutations (MSKCC). The study is supported by the Robert and Kate Niehaus Clinical Cancer Genetics Research Initiative, The Breast Cancer Research Foundation, and the Cancer Center Support Grant from the National Institute of Health, National Cancer Institute 5P30 CA08748-40. Women's Learning the Influence of Family and Environment (LIFE). This study was supported by grants CA17054 and CA74847 from the National Cancer Institute, National Institutes of Health, No. 4PB-0092 from the California Breast Cancer Research Program of the University of California. Philadelphia site of the Breast Cancer Family Registry at Fox Chase Cancer Center (FCCC). This study is supported by NIH grant CA164920.
Study
phs000929
Sequencing_probands_and_families_with_severe_insulin_resistance_syndromes
This is an ongoing project and continuation to all the sequencing we have been doing over the last few years. We have some additional families and probands with syndromes of insulin resistance not previously sequenced within uk10k or other core funded projects. We would like to complete the sequencing in all of the good quality families and probands we have, this would require another ~50 samples to be WES sequenced. This cohort has already proven to be a rich source of interesting findings with papers in Science and Nature genetics.
Study
EGAS00001000488
CSER: Evaluating Utility and Improving Implementation of Genomic Sequencing for Pediatric Cancer Patients in the Diverse Population and Healthcare Settings of Texas: The KidsCanSeq Study
Through this Clinical Sequencing Evidence-Generating Research (CSER) with Enhanced Diversity project we are completing a clinical study (The Texas KidsCanSeq Study) comparing the results of targeted cancer panel sequencing versus genome-scale testing in pediatric cancer patients across diverse clinical settings. We will compare the targeted pediatric cancer panel to germline whole exome sequencing (WES) of unselected childhood cancer patients. We will also compare an RNA/DNA targeted pediatric cancer panel versus WES, transcriptome sequencing and copy number array of FFPE tumor samples for the subset of patients with high-risk tumors. Exome sequencing is performed from a blood or saliva sample of the pediatric age (0-18) patient. We are building on our success completing the CSER program BASIC3 exome sequencing trial (which included 60% Hispanic and African-American patients from a single large academic center) in this large multi-institutional study of an even more diverse patient population from six pediatric oncology healthcare settings across Texas. Data from the Texas KidsCanSeq study will be submitted through the CSER Data Coordinating Center for access in NIH-supported databases.
Study
phs002378
Translational Research Investigating Underlying disparities in acute Myocardial infarction Patients' Health status (TRIUMPH)
The Translational Research Investigating Underlying disparities in acute Myocardial infarction Patients' Health status (TRIUMPH) cohort is a cohort of 4340 patients prospectively enrolled after hospitalization for acute myocardial infarction (AMI) between June 1, 2005, and December 31, 2008 from 24 U.S. hospitals. Consenting patients had detailed chart abstractions of their medical history and processes of inpatient care, supplemented with a detailed baseline interview. Detailed genetic and metabolic data were obtained at hospital discharge in 2979 (69%) and 3013 patients (69%), respectively. Centralized follow-up interviews sought to quantify patients' post-discharge care and outcomes, with a focus on their health status (symptoms, function, and quality of life). Reprinted from [Article Citation], with permission from [Publisher].
Study
phs001518
A novel orthotopic patient-derived xenograft model of radiation-induced glioma following medulloblastoma
Radiation-induced glioma (RIG) is a highly aggressive brain cancer arising as a consequence of radiation therapy. We report a case of RIG that arose in the brain stem following treatment for paediatric medulloblastoma, and the development and characterisation of a matched orthotopic patient-derived xenograft (PDX) model (TK-RIG915). Patient and PDX tumours were analysed using DNA methylation profiling, whole genome sequencing (WGS) and RNA sequencing. While initially thought to be a diffuse intrinsic pontine glioma (DIPG) based on disease location, results from methylation profiling and WGS were not consistent with this diagnosis. Furthermore, clustering analyses based on RNA expression suggested the tumours were distinct from primary DIPG. Additional gene expression analysis demonstrated concordance with a published RIG expression profile. Multiple genetic alterations that enhance PI3K/AKT and Ras/Raf/MEK/ERK signalling were discovered in TK-RIG915 including an activating mutation in PIK3CA, upregulation of PDGFRA and AKT2, inactivating mutations in NF1, and a gain-of-function mutation in PTPN11. Additionally, deletion of CDKN2A/B, increased IDH1 expression, and decreased ARID1A expression were observed. Detection of phospho-S6, -4EBP1 and -ERK via immunohistochemistry confirmed PI3K pathway and ERK activation. Here, we report one of the first PDX models for RIG, which recapitulates the patient disease and is molecularly distinct from primary brain stem glioma. Genetic interrogation of this model has enabled the identification of potential therapeutic vulnerabilities in this currently incurable disease.
Study
EGAS00001004709
Study of Addiction: Genetics and Environment (SAGE)
This study is part of the Gene Environment Association Studies initiative (GENEVA) funded by the National Human Genome Research Institute. The overarching goal is to identify novel genetic factors that contribute to addiction through a large-scale genome-wide association study of DSM-IV alcohol dependent (and frequently illicit drug dependent) cases and non-dependent, unrelated control subjects of European and African American descent. The focus of this proposal is a case-control design of unrelated individuals for a genetic association study of addiction. Cases are defined as individuals with DSM-IV alcohol dependence (lifetime) and potentially other illicit drug dependence. In addition to the categorical diagnosis, we have data on ordinal measurements of number of DSM-IV symptoms for alcohol, nicotine, marijuana, cocaine, opiates and other drugs so that we will able to construct quantitative measurements of addiction severity over a wide range of substances. Controls are defined as individuals who have been exposed to alcohol (and possibly to other drugs), but have never met lifetime diagnosis for alcohol dependence or dependence on other illicit substances. Analyses that include refinement of the phenotype and incorporation of important demographic and environmental factors into association studies will be pursued. Cases and controls were selected from three large, complementary datasets: the Collaborative Study on the Genetics of Alcoholism (COGA), the Family Study of Cocaine Dependence (FSCD), and the Collaborative Genetic Study of Nicotine Dependence (COGEND). COGA: COGA was initiated in 1989 and is a large-scale family study that has had as its primary aim the identification of genes that contribute to alcoholism susceptibility and related characteristics. COGA is funded through the National Institute on Alcohol Abuse and Alcoholism (NIAAA). Subjects were recruited from 7 sites across the U.S. Alcohol dependent probands were recruited from treatment facilities and assessed by personal interview. After securing permission, other family members were also assessed. A set of comparison families was drawn from the same communities as the families recruited through the alcohol dependent probands. Assessment involved a comprehensive personal interview developed for this project, the Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA), which gathers detailed information on alcoholism related symptoms along with other drugs and psychiatric symptoms. Families with three or more first-degree relatives who were alcohol dependent were invited for more extensive testing, including neurophysiology evaluations (ERPs and EEGs) and a battery of neuropsychological assessments. Blood was obtained for genetic studies. Institutional Review Boards at all sites approved the protocols, including sharing in the NIAAA national repository. COGA has four Co-Principal Investigators Bernice Porjesz, Victor Hesselbrock, Howard Edenberg, and Laura Bierut. COGA includes nine different centers where data collection, analysis, and storage take place. The nine sites and Principal Investigators and Co-investigators are: University of Connecticut (Victor Hesselbrock); Indiana University (Howard Edenberg, John Nurnberger, Jr., Tatiana Foroud); University of Iowa (Samuel Kuperman); SUNY Downstate (Bernice Porjesz); Washington University in St. Louis (Laura Bierut, Alison Goate, John Rice); University of California at San Diego (Marc Schuckit); Howard University (Robert Taylor); Rutgers University (Jay Tischfield); Southwest Foundation (Laura Almasy). Q. Max Guo serves as the NIAAA Staff Collaborator. This national collaborative study is supported by the NIH Grant U10AA008401 from the National Institute on Alcohol Abuse and Alcoholism (NIAAA) and the National Institute on Drug Abuse (NIDA). Family Study of Cocaine Dependence (FSCD): This project was initiated in 2000 as a case-control family study of cocaine dependence funded through the National Institute on Drug Abuse (NIDA; PI: Laura Bierut). The primary goal was to increase the understanding of the familial and non-familial antecedents and consequences of cocaine dependence. Cocaine dependent individuals were systematically recruited from chemical dependency treatment units (both public and private; residential and outpatient) in the greater St. Louis metropolitan area. Community based control subjects were identified through a Missouri Driver's License Registry (maintained at Washington University for research purposes) and matched by age, race, gender, and residential zip code. As a supplement to this project, blood samples were collected for future genetic analysis and were included in the NIDA Genetics Consortium. Phenotypic data, DNA, and cell lines are in the NIDA Center for Genetics Studies. Collaborative Genetic Study of Nicotine Dependence (COGEND): COGEND was initiated in 2001 as a three-part program project grant funded through the National Cancer Institute (NCI; PI: Laura Bierut). The three projects included a study of the familial transmission of nicotine dependence, a genetic study of nicotine dependence, and a study of the relationship of nicotine dependence with nicotine metabolism. The primary goal is to detect, localize, and characterize genes that predispose or protect an individual with respect to heavy tobacco consumption, nicotine dependence, and related phenotypes and to integrate these findings with the family transmission and nicotine metabolism findings. The primary design is a community based case-control family study. All subjects were recruited from Detroit and St. Louis. Nicotine dependent cases and non-dependent smoking controls were identified and recruited. In addition, one sibling for each case and control subject was recruited in a subset of the sample. Over 56,000 subjects aged 25-44 years were screened by telephone, over 3,100 subjects were personally interviewed, and over 2,900 donated blood samples for genetic studies. All three studies (COGA, COGEND, FSCD) include measures of basic socio-demographic variables, including age, sex, race/ethnicity, family income, educational attainment, religious participation, and family structure. Other important covariates and/or potential moderators of genetic effects include comorbid addictions and age at initiation of use for cigarettes, alcohol and drugs. The assessments also include measures of various life stressors, such as physical and sexual abuse, which have been implicated in gene-environment interactions for several disorders. Coding for both individual variables and indices has been standardized across studies. All subjects were assessed in person by trained research assistants. Funding support for genotyping, which was performed at the Johns Hopkins University Center for Inherited Disease Research (CIDR), was provided by the NIH GEI (U01HG004438), the National Institute on Alcohol Abuse and Alcoholism, the National Institute on Drug Abuse, and the NIH contract "High throughput genotyping for studying the genetic contributions to human disease"(HHSN268200782096C). Note for Publications Related to Study: The Study of Addiction: Genetics and Environment (SAGE) has not yet generated publications. Below is a listing of publications related to the three studies from which the SAGE sample was selected. COGA has over 228 publications listed at www.niaaagenetics.org This study is part of the Gene Environment Association Studies initiative (GENEVA, http://www.genevastudy.org) funded by the trans-NIH Genes, Environment, and Health Initiative (GEI). The overarching goal is to identify novel genetic factors that contribute to addiction through a large-scale genome-wide association study of DSM-IV alcohol dependent (and frequently illicit drug dependent) cases and non-dependent, unrelated control subjects of European and African American descent. Genotyping was performed at the Johns Hopkins University Center for Inherited Disease Research (CIDR). Data cleaning and harmonization were done at the GEI-funded GENEVA Coordinating Center at the University of Washington.
Study
phs000092
Systems Analysis of Single-Cell Heterogeneity Underlying Glioma Drug Resistance
This study characterizes the transcriptional regulatory mechanisms that drive responses of two patient-derived glioblastoma multiforme (GBM) stem-like cells (PD-GSCs) that have distinct phenotypes - one sensitive and another resistant to the drug pitavastatin. This study investigates the differing mechanisms driving drug response in the two PD-GSCs at the single-cell level, and provides an approach that can be used to infer transcriptional regulatory network models, which can then be used to identify mechanisms driving cell-state transitions, e.g., proneural-to-mesenchymal transitions (PMT), which has been observed experimentally and clinically in GBM. The use of these network models enabled the identification of transcription factor and gene targets that perturbed drug-induced PMT through 1) simulations of network dynamics network, and 2) characterization of gene program activities over the time-course response to drug treatment. These results enabled the identification of multiple siRNAs and the drug vinflunine as secondary components that potentiate the efficacy of pitavastatin. This work demonstrates an approach to uncover the transcriptional network topology (TRN) topology of PD-GSCs, and use it to rationally predict combinatorial treatments that block treatment escape and acquired resistance to drugs in GBM.
Study
phs003501
Circulating RNAs in Acute Heart Failure (CRUCIAL)
The purpose of this American Heart Association-funded and NIH-funded study is to examine circulating RNAs in the acute congestive heart failure (CHF) setting, and how they change with decongestive therapy, and their function in vitro and in vivo. The investigators are testing the hypothesis that ex-RNA levels change significantly during decongestion therapy and can be used as a marker of those individuals who respond to CHF therapy (in terms of cardiac structure or outcome). Additionally, the translational research design allows the investigators to assay the effects of these RNAs on tissue phenotypes in vitro.
Study
phs003403
Childhood Cancer Data Initiative (CCDI): CCDI Pediatric In Vivo Testing Program - Leukemia
The goal of this study is to molecularly characterize a large panel of pediatric acute lymphoblastic leukemia (ALL) patient-derived xenografts (PDXs) previously established in immune-deficient mice. These PDXs are utilized as part of the NCI-funded Pediatric Preclinical In vivo Testing (PIVOT) program to identify novel agents and combinations. Biospecimen data include next-generation sequencing (RNAseq, whole exome sequencing, DNA copy number variation), whole-genome analysis of cytogenetic abnormalities, and DNA fingerprint for quality control.
Study
phs003164
Refractory Cancer (RC) Program
The Refractory Cancer (RC) Program will investigate the underlying genomic hallmarks involved in the observed inferior response to treatment of certain tumor types. Comprehensive genomic characterization will be performed utilizing the NCI CCG Genome Characterization Pipeline. Subsequent genomic data will be hosted at the NCI Genomic Data Commons (GDC) (https://portal.gdc.cancer.gov/).
Study
phs002097
OncoArray: Prostate Cancer
Original description of the study: From ELLIPSE (linked to the PRACTICAL consortium), we contributed ~78,000 SNPs to the OncoArray. A large fraction of the content was derived from the GWAS meta-analyses in European ancestry populations (overall and aggressive disease; ~27K SNPs). We also selected just over 10,000 SNPs from the meta-analyses in the non-European populations, with a majority of these SNPs coming from the analysis of overall prostate cancer in African ancestry populations as well as from the multiethnic meta-analysis. A substantial fraction of SNPs (~28,000) were also selected for fine-mapping of 53 loci not included in the common fine-mapping regions (tagging at r2>0.9 across ±500kb regions). We also selected a few thousand SNPs related with PSA levels and/or disease survival as well as SNPs from candidate lists provided by study collaborators, as well as from meta-analyses of exome SNP chip data from the Multiethnic Cohort and UK studies. The Contributing Studies: Aarhus: Hospital-based, Retrospective, Observational. Source of cases: Patients treated for prostate adenocarcinoma at Department of Urology, Aarhus University Hospital, Skejby (Aarhus, Denmark). Source of controls: Age-matched males treated for myocardial infarction or undergoing coronary angioplasty, but with no prostate cancer diagnosis based on information retrieved from the Danish Cancer Register and the Danish Cause of Death Register. AHS: Nested case-control study within prospective cohort. Source of cases: linkage to cancer registries in study states. Source of controls: matched controls from cohort ATBC: Prospective, nested case-control. Source of cases: Finnish male smokers aged 50-69 years at baseline. Source of controls: Finnish male smokers aged 50-69 years at baseline BioVu: Cases identified in a biobank linked to electronic health records. Source of cases: A total of 214 cases were identified in the VUMC de-identified electronic health records database (the Synthetic Derivative) and shipped to USC for genotyping in April 2014. The following criteria were used to identify cases: Age 18 or greater; male; African Americans (Black) only. Note that African ancestry is not self-identified, it is administratively or third-party assigned (which has been shown to be highly correlated with genetic ancestry for African Americans in BioVU; see references). Source of controls: Controls were identified in the de-identified electronic health record. Unfortunately, they were not age matched to the cases, and therefore cannot be used for this study. Canary PASS: Prospective, Multi-site, Observational Active Surveillance Study. Source of cases: clinic based from Beth Israel Deaconness Medical Center, Eastern Virginia Medical School, University of California at San Francisco, University of Texas Health Sciences Center San Antonio, University of Washington, VA Puget Sound. Source of controls: N/A CCI: Case series, Hospital-based. Source of cases: Cases identified through clinics at the Cross Cancer Institute. Source of controls: N/A CerePP French Prostate Cancer Case-Control Study (ProGene): Case-Control, Prospective, Observational, Hospital-based. Source of cases: Patients, treated in French departments of Urology, who had histologically confirmed prostate cancer. Source of controls: Controls were recruited as participating in a systematic health screening program and found unaffected (normal digital rectal examination and total PSA < 4 ng/ml, or negative biopsy if PSA > 4 ng/ml). COH: hospital-based cases and controls from outside. Source of cases: Consented prostate cancer cases at City of Hope. Source of controls: Consented unaffected males that were part of other studies where they consented to have their DNA used for other research studies. COSM: Population-based cohort. Source of cases: General population. Source of controls: General population CPCS1: Case-control - Denmark. Source of cases: Hospital referrals. Source of controls: Copenhagen General Population Study CPCS2: Source of cases: Hospital referrals. Source of controls: Copenhagen General Population Study CPDR: Retrospective cohort. Source of cases: Walter Reed National Military Medical Center. Source of controls: Walter Reed National Military Medical Center ACS_CPS-II: Nested case-control derived from a prospective cohort study. Source of cases: Identified through self-report on follow-up questionnaires and verified through medical records or cancer registries, identified through cancer registries or the National Death Index (with prostate cancer as the primary cause of death). Source of controls: Cohort participants who were cancer-free at the time of diagnosis of the matched case, also matched on age (±6 mo) and date of biospecimen donation (±6 mo). EPIC: Case-control - Germany, Greece, Italy, Netherlands, Spain, Sweden, UK. Source of cases: Identified through record linkage with population-based cancer registries in Italy, the Netherlands, Spain, Sweden and UK. In Germany and Greece, follow-up is active and achieved through checks of insurance records and cancer and pathology registries as well as via self-reported questionnaires; self-reported incident cancers are verified through medical records. Source of controls: Cohort participants without a diagnosis of cancer EPICAP: Case-control, Population-based, ages less than 75 years at diagnosis, Hérault, France. Source of cases: Prostate cancer cases in all public hospitals and private urology clinics of département of Hérault in France. Cases validation by the Hérault Cancer Registry. Source of controls: Population-based controls, frequency age matched (5-year groups). Quotas by socio-economic status (SES) in order to obtain a distribution by SES among controls identical to the SES distribution among general population men, conditionally to age. ERSPC: Population-based randomized trial. Source of cases: Men with PrCa from screening arm ERSPC Rotterdam. Source of controls: Men without PrCa from screening arm ERSPC Rotterdam ESTHER: Case-control, Prospective, Observational, Population-based. Source of cases: Prostate cancer cases in all hospitals in the state of Saarland, from 2001-2003. Source of controls: Random sample of participants from routine health check-up in Saarland, in 2000-2002 FHCRC: Population-based, case-control, ages 35-74 years at diagnosis, King County, WA, USA. Source of cases: Identified through the Seattle-Puget Sound SEER cancer registry. Source of controls: Randomly selected, age-frequency matched residents from the same county as cases Gene-PARE: Hospital-based. Source of cases: Patients that received radiotherapy for treatment of prostate cancer. Source of controls: n/a Hamburg-Zagreb: Hospital-based, Prospective. Source of cases: Prostate cancer cases seen at the Department of Oncology, University Hospital Center Zagreb, Croatia. Source of controls: Population-based (Croatia), healthy men, older than 50, with no medical record of cancer, and no family history of cancer (1st & 2nd degree relatives) HPFS: Nested case-control. Source of cases: Participants of the HPFS cohort. Source of controls: Participants of the HPFS cohort IMPACT: Observational. Source of cases: Carriers and non-carriers (with a known mutation in the family) of the BRCA1 and BRCA2 genes, aged between 40 and 69, who are undergoing prostate screening with annual PSA testing. This cohort has been diagnosed with prostate cancer during the study. Source of controls: Carriers and non-carriers (with a known mutation in the family) of the BRCA1 and BRCA2 genes, aged between 40 and 69, who are undergoing prostate screening with annual PSA testing. This cohort has not been diagnosed with prostate cancer during the study. IPO-Porto: Hospital-based. Source of cases: Early onset and/or familial prostate cancer. Source of controls: Blood donors Karuprostate: Case-control, Retrospective, Population-based. Source of cases: From FWI (Guadeloupe): 237 consecutive incident patients with histologically confirmed prostate cancer attending public and private urology clinics; From Democratic Republic of Congo: 148 consecutive incident patients with histologically confirmed prostate cancer attending the University Clinic of Kinshasa. Source of controls: From FWI (Guadeloupe): 277 controls recruited from men participating in a free systematic health screening program open to the general population; From Democratic Republic of Congo: 134 controls recruited from subjects attending the University Clinic of Kinshasa KULEUVEN: Hospital-based, Prospective, Observational. Source of cases: Prostate cancer cases recruited at the University Hospital Leuven. Source of controls: Healthy males with no history of prostate cancer recruited at the University Hospitals, Leuven. LAAPC: Subjects were participants in a population-based case-control study of aggressive prostate cancer conducted in Los Angeles County. Cases were identified through the Los Angeles County Cancer Surveillance Program rapid case ascertainment system. Eligible cases included African American, Hispanic, and non-Hispanic White men diagnosed with a first primary prostate cancer between January 1, 1999 and December 31, 2003. Eligible cases also had (a) prostatectomy with documented tumor extension outside the prostate, (b) metastatic prostate cancer in sites other than prostate, (c) needle biopsy of the prostate with Gleason grade ≥8, or (d) needle biopsy with Gleason grade 7 and tumor in more than two thirds of the biopsy cores. Eligible controls were men never diagnosed with prostate cancer, living in the same neighborhood as a case, and were frequency matched to cases on age (± 5 y) and race/ethnicity. Controls were identified by a neighborhood walk algorithm, which proceeds through an obligatory sequence of adjacent houses or residential units beginning at a specific residence that has a specific geographic relationship to the residence where the case lived at diagnosis. Malaysia: Case-control. Source of cases: Patients attended the outpatient urology or uro-onco clinic at University Malaya Medical Center. Source of controls: Population-based, age matched (5-year groups), ascertained through electoral register, Subang Jaya, Selangor, Malaysia MCC-Spain: Case-control. Source of cases: Identified through the urology departments of the participating hospitals. Source of controls: Population-based, frequency age and region matched, ascertained through the rosters of the primary health care centers MCCS: Nested case-control, Melbourne, Victoria. Source of cases: Identified by linkage to the Victorian Cancer Registry. Source of controls: Cohort participants without a diagnosis of cancer MD Anderson: Participants in this study were identified from epidemiological prostate cancer studies conducted at the University of Texas MD Anderson Cancer Center in the Houston Metropolitan area. Cases were accrued in the Houston Medical Center and were not restricted with respect to Gleason score, stage or PSA. Controls were identified via random-digit-dialing or among hospital visitors and they were frequency matched to cases on age and race. Lifestyle, demographic, and family history data were collected using a standardized questionnaire. MDACC_AS: A prospective cohort study. Source of cases: Men with clinically organ-confined prostate cancer meeting eligibility criteria for a prospective cohort study of active surveillance at MD Anderson Cancer Center. Source of controls: N/A MEC: The Multiethnic Cohort (MEC) is comprised of over 215,000 men and women recruited from Hawaii and the Los Angeles area between 1993 and 1996. Between 1995 and 2006, over 65,000 blood samples were collected from participants for genetic analyses. To identify incident cancer cases, the MEC was cross-linked with the population-based Surveillance, Epidemiology and End Results (SEER) registries in California and Hawaii, and unaffected cohort participants with blood samples were selected as controls MIAMI (WFPCS): Prostate cancer cases and controls were recruited from the Departments of Urology and Internal Medicine of the Wake Forest University School of Medicine using sequential patient populations as described previously (PMID:15342424). All study subjects received a detailed description of the study protocol and signed their informed consent, as approved by the medical center's Institutional Review Board. The general eligibility criteria were (i) able to comprehend informed consent and (ii) without previously diagnosed cancer. The exclusion criteria were (i) clinical diagnosis of autoimmune diseases; (ii) chronic inflammatory conditions; and (iii) infections within the past 6 weeks. Blood samples were collected from all subjects. MOFFITT: Hospital-based. Source of cases: clinic based from Moffitt Cancer Center. Source of controls: Moffitt Cancer Center affiliated Lifetime cancer screening center NMHS: Case-control, clinic based, Nashville TN. Source of cases: All urology clinics in Nashville, TN. Source of controls: Men without prostate cancer at prostate biopsy. PCaP: The North Carolina-Louisiana Prostate Cancer Project (PCaP) is a multidisciplinary population-based case-only study designed to address racial differences in prostate cancer through a comprehensive evaluation of social, individual and tumor level influences on prostate cancer aggressiveness. PCaP enrolled approximately equal numbers of African Americans and Caucasian Americans with newly-diagnosed prostate cancer from North Carolina (42 counties) and Louisiana (30 parishes) identified through state tumor registries. African American PCaP subjects with DNA, who agreed to future use of specimens for research, participated in OncoArray analysis. PCMUS: Case-control - Sofia, Bulgaria. Source of cases: Patients of Clinic of Urology, Alexandrovska University Hospital, Sofia, Bulgaria, PrCa histopathologically confirmed. Source of controls: 72 patients with verified BPH and PSA<3,5; 78 healthy controls from the MMC Biobank, no history of PrCa PHS: Nested case-control. Source of cases: Participants of the PHS1 trial/cohort. Source of controls: Participants of the PHS1 trial/cohort PLCO: Nested case-control. Source of cases: Men with a confirmed diagnosis of prostate cancer from the PLCO Cancer Screening Trial. Source of controls: Controls were men enrolled in the PLCO Cancer Screening Trial without a diagnosis of cancer at the time of case ascertainment. Poland: Case-control. Source of cases: men with unselected prostate cancer, diagnosed in north-western Poland at the University Hospital in Szczecin. Source of controls: cancer-free men from the same population, taken from the healthy adult patients of family doctors in the Szczecin region PROCAP: Population-based, Retrospective, Observational. Source of cases: Cases were ascertained from the National Prostate Cancer Register of Sweden Follow-Up Study, a retrospective nationwide cohort study of patients with localized prostate cancer. Source of controls: Controls were selected among men referred for PSA testing in laboratories in Stockholm County, Sweden, between 2010 and 2012. PROGReSS: Hospital-based, Prospective, Observational. Source of cases: Prostate cancer cases from the Hospital Clínico Universitario de Santiago de Compostela, Galicia, Spain. Source of controls: Cancer-free men from the same population ProMPT: A study to collect samples and data from subjects with and without prostate cancer. Retrospective, Experimental. Source of cases: Subjects attending outpatient clinics in hospitals. Source of controls: Subjects attending outpatient clinics in hospitals ProtecT: Trial of treatment. Samples taken from subjects invited for PSA testing from the community at nine centers across United Kingdom. Source of cases: Subjects who have a proven diagnosis of prostate cancer following testing. Source of controls: Identified through invitation of subjects in the community. PROtEuS: Case-control, population-based. Source of cases: All new histologically-confirmed cases, aged less or equal to 75 years, diagnosed between 2005 and 2009, actively ascertained across Montreal French hospitals. Source of controls: Randomly selected from the Provincial electoral list of French-speaking men between 2005 and 2009, from the same area of residence as cases and frequency-matched on age. QLD: Case-control. Source of cases: A longitudinal cohort study (Prostate Cancer Supportive Care and Patient Outcomes Project: ProsCan) conducted in Queensland, through which men newly diagnosed with prostate cancer from 26 private practices and 10 public hospitals were directly referred to ProsCan at the time of diagnosis by their treating clinician (age range 43-88 years). All cases had histopathologically confirmed prostate cancer, following presentation with an abnormal serum PSA and/or lower urinary tract symptoms. Source of controls: Controls comprised healthy male blood donors with no personal history of prostate cancer, recruited through (i) the Australian Red Cross Blood Services in Brisbane (age range 19-76 years) and (ii) the Australian Electoral Commission (AEC) (age and post-code/ area matched to ProsCan, age range 54-90 years). RAPPER: Multi-centre, hospital based blood sample collection study in patients enrolled in clinical trials with prospective collection of radiotherapy toxicity data. Source of cases: Prostate cancer patients enrolled in radiotherapy trials: CHHiP, RT01, Dose Escalation, RADICALS, Pelvic IMRT, PIVOTAL. Source of controls: N/A SABOR: Prostate Cancer Screening Cohort. Source of cases: Men >45 yrs of age participating in annual PSA screening. Source of controls: Males participating in annual PSA prostate cancer risk evaluations (funded by NCI biomarkers discovery and validation grant), recruited through University of Texas Health Science Center at San Antonio and affiliated sites or through study advertisements, enrolment open to the community SCCS: Case-control in cohort, Southeastern USA. Prospective, Observational, Population-based. Source of cases: SCCS entry population. Source of controls: SCCS entry population SCPCS: Population-based, Retrospective, Observational. Source of cases: South Carolina Central Cancer Registry. Source of controls: Health Care Financing Administration beneficiary file SEARCH: Case-control - East Anglia, UK. Source of cases: Men < 70 years of age registered with prostate cancer at the population-based cancer registry, Eastern Cancer Registration and Information Centre, East Anglia, UK. Source of controls: Men attending general practice in East Anglia with no known prostate cancer diagnosis, frequency matched to cases by age and geographic region SNP_Prostate_Ghent: Hospital-based, Retrospective, Observational. Source of cases: Men treated with IMRT as primary or postoperative treatment for prostate cancer at the Ghent University Hospital between 2000 and 2010. Source of controls: Employees of the University hospital and members of social activity clubs, without a history of any cancer. SPAG: Hospital-based, Retrospective, Observational. Source of cases: Guernsey. Source of controls: Guernsey STHM2: Population-based, Retrospective, Observational. Source of cases: Cases were selected among men referred for PSA testing in laboratories in Stockholm County, Sweden, between 2010 and 2012. Source of controls: Controls were selected among men referred for PSA testing in laboratories in Stockholm County, Sweden, between 2010 and 2012. PCPT: Case-control from a randomized clinical trial. Source of cases: Randomized clinical trial. Source of controls: Randomized clinical trial SELECT: Case-cohort from a randomized clinical trial. Source of cases: Randomized clinical trial. Source of controls: Randomized clinical trial TAMPERE: Case-control - Finland, Retrospective, Observational, Population-based. Source of cases: Identified through linkage to the Finnish Cancer Registry and patient records; and the Finnish arm of the ERSPC study. Source of controls: Cohort participants without a diagnosis of cancer UGANDA: Uganda Prostate Cancer Study: Uganda is a case-control study of prostate cancer in Kampala Uganda that was initiated in 2011. Men with prostate cancer were enrolled from the Urology unit at Mulago Hospital and men without prostate cancer (i.e. controls) were enrolled from other clinics (i.e. surgery) at the hospital. UKGPCS: ICR, UK. Source of cases: Cases identified through clinics at the Royal Marsden hospital and nationwide NCRN hospitals. Source of controls: Ken Muir's control- 2000 ULM: Case-control - Germany. Source of cases: familial cases (n=162): identified through questionnaires for family history by collaborating urologists all over Germany; sporadic cases (n=308): prostatectomy series performed in the Clinic of Urology Ulm between 2012 and 2014. Source of controls: age-matched controls (n=188): age-matched men without prostate cancer and negative family history collected in hospitals of Ulm WUGS/WUPCS: Cases Series, USA. Source of cases: Identified through clinics at Washington University in St. Louis. Source of controls: Men diagnosed and managed with prostate cancer in University based clinic. Acknowledgement Statements: Aarhus: This study was supported by the Danish Strategic Research Council (now Innovation Fund Denmark) and the Danish Cancer Society. The Danish Cancer Biobank (DCB) is acknowledged for biological material. AHS: This work was supported by the Intramural Research Program of the NIH, National Cancer Institute, Division of Cancer Epidemiology and Genetics (Z01CP010119). ATBC: This research was supported in part by the Intramural Research Program of the NIH and the National Cancer Institute. Additionally, this research was supported by U.S. Public Health Service contracts N01-CN-45165, N01-RC-45035, N01-RC-37004, HHSN261201000006C, and HHSN261201500005C from the National Cancer Institute, Department of Health and Human Services. BioVu: The dataset(s) used for the analyses described were obtained from Vanderbilt University Medical Center's BioVU which is supported by institutional funding and by the National Center for Research Resources, Grant UL1 RR024975-01 (which is now at the National Center for Advancing Translational Sciences, Grant 2 UL1 TR000445-06). Canary PASS: PASS was supported by Canary Foundation and the National Cancer Institute's Early Detection Research Network (U01 CA086402) CCI: This work was awarded by Prostate Cancer Canada and is proudly funded by the Movember Foundation - Grant # D2013-36.The CCI group would like to thank David Murray, Razmik Mirzayans, and April Scott for their contribution to this work. CerePP French Prostate Cancer Case-Control Study (ProGene): None reported COH: SLN is partially supported by the Morris and Horowitz Families Endowed Professorship COSM: The Swedish Research Council, the Swedish Cancer Foundation CPCS1 & CPCS2: Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Herlev Ringvej 75, DK-2730 Herlev, DenmarkCPCS1 would like to thank the participants and staff of the Copenhagen General Population Study for their important contributions. CPDR: Uniformed Services University for the Health Sciences HU0001-10-2-0002 (PI: David G. McLeod, MD) CPS-II: The American Cancer Society funds the creation, maintenance, and updating of the Cancer Prevention Study II cohort. CPS-II thanks the participants and Study Management Group for their invaluable contributions to this research. We would also like to acknowledge the contribution to this study from central cancer registries supported through the Centers for Disease Control and Prevention National Program of Cancer Registries, and cancer registries supported by the National Cancer Institute Surveillance Epidemiology and End Results program. EPIC: The coordination of EPIC is financially supported by the European Commission (DG-SANCO) and the International Agency for Research on Cancer. The national cohorts are supported by the Danish Cancer Society (Denmark); the Deutsche Krebshilfe, Deutsches Krebsforschungszentrum and Federal Ministry of Education and Research (Germany); the Hellenic Health Foundation, Greek Ministry of Health; Greek Ministry of Education (Greece); the Italian Association for Research on Cancer (AIRC) and National Research Council (Italy); the Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF); the Statistics Netherlands (The Netherlands); the Health Research Fund (FIS), Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, Spanish Ministry of Health ISCIII RETIC (RD06/0020), Red de Centros RCESP, C03/09 (Spain); the Swedish Cancer Society, Swedish Scientific Council and Regional Government of Skåne and Västerbotten, Fundacion Federico SA (Sweden); the Cancer Research UK, Medical Research Council (United Kingdom). EPICAP: The EPICAP study was supported by grants from Ligue Nationale Contre le Cancer, Ligue départementale du Val de Marne; Fondation de France; Agence Nationale de sécurité sanitaire de l'alimentation, de l'environnement et du travail (ANSES). The EPICAP study group would like to thank all urologists, Antoinette Anger and Hasina Randrianasolo (study monitors), Anne-Laure Astolfi, Coline Bernard, Oriane Noyer, Marie-Hélène De Campo, Sandrine Margaroline, Louise N'Diaye, and Sabine Perrier-Bonnet (Clinical Research nurses). ERSPC: This study was supported by the DutchCancerSociety (KWF94-869,98-1657,2002-277,2006-3518, 2010-4800), The Netherlands Organisation for Health Research and Development (ZonMW-002822820, 22000106, 50-50110-98-311, 62300035), The Dutch Cancer Research Foundation (SWOP), and an unconditional grant from Beckman-Coulter-HybritechInc. ESTHER: The ESTHER study was supported by a grant from the Baden Württemberg Ministry of Science, Research and Arts. The ESTHER group would like to thank Hartwig Ziegler, Sonja Wolf, Volker Hermann, Heiko Müller, Karina Dieffenbach, Katja Butterbach for valuable contributions to the study. FHCRC: The FHCRC studies were supported by grants R01-CA056678, R01-CA082664, and R01-CA092579 from the US National Cancer Institute, National Institutes of Health, with additional support from the Fred Hutchinson Cancer Research Center. FHCRC would like to thank all the men who participated in these studies. Gene-PARE: The Gene-PARE study was supported by grants 1R01CA134444 from the U.S. National Institutes of Health, PC074201 and W81XWH-15-1-0680 from the Prostate Cancer Research Program of the Department of Defense and RSGT-05-200-01-CCE from the American Cancer Society. Hamburg-Zagreb: None reported HPFS: The Health Professionals Follow-up Study was supported by grants UM1CA167552, CA133891, CA141298, and P01CA055075. HPFS are grateful to the participants and staff of the Physicians' Health Study and Health Professionals Follow-Up Study for their valuable contributions, as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, and WY. IMPACT: The IMPACT study was funded by The Ronald and Rita McAulay Foundation, CR-UK Project grant (C5047/A1232), Cancer Australia, AICR Netherlands A10-0227, Cancer Australia and Cancer Council Tasmania, NIHR, EU Framework 6, Cancer Councils of Victoria and South Australia, and Philanthropic donation to Northshore University Health System. We acknowledge support from the National Institute for Health Research (NIHR) to the Biomedical Research Centre at The Institute of Cancer Research and Royal Marsden Foundation NHS Trust. IMPACT acknowledges the IMPACT study steering committee, collaborating centres, and participants. IPO-Porto: The IPO-Porto study was funded by Fundaçäo para a Ciência e a Tecnologia (FCT; UID/DTP/00776/2013 and PTDC/DTP-PIC/1308/2014) and by IPO-Porto Research Center (CI-IPOP-16-2012 and CI-IPOP-24-2015). MC and MPS are research fellows from Liga Portuguesa Contra o Cancro, Núcleo Regional do Norte. SM is a research fellow from FCT (SFRH/BD/71397/2010). IPO-Porto would like to express our gratitude to all patients and families who have participated in this study. Karuprostate: The Karuprostate study was supported by the the Frech National Health Directorate and by the Association pour la Recherche sur les Tumeurs de la ProstateKarusprostate thanks Séverine Ferdinand. KULEUVEN: F.C. and S.J. are holders of grants from FWO Vlaanderen (G.0684.12N and G.0830.13N), the Belgian federal government (National Cancer Plan KPC_29_023), and a Concerted Research Action of the KU Leuven (GOA/15/017). TVDB is holder of a doctoral fellowship of the FWO. LAAPC: This study was funded by grant R01CA84979 (to S.A. Ingles) from the National Cancer Institute, National Institutes of Health. Malaysia: The study was funded by the University Malaya High Impact Research Grant (HIR/MOHE/MED/35). Malaysia thanks all associates in the Urology Unit, University of Malaya, Cancer Research Initiatives Foundation (CARIF) and the Malaysian Men's Health Initiative (MMHI). MCCS: MCCS cohort recruitment was funded by VicHealth and Cancer Council Victoria. The MCCS was further supported by Australian NHMRC grants 209057, 251553, and 504711, and by infrastructure provided by Cancer Council Victoria. Cases and their vital status were ascertained through the Victorian Cancer Registry (VCR) and the Australian Institute of Health and Welfare (AIHW), including the National Death Index and the Australian Cancer Database. MCC-Spain: The study was partially funded by the Accion Transversal del Cancer, approved on the Spanish Ministry Council on the 11th October 2007, by the Instituto de Salud Carlos III-FEDER (PI08/1770, PI09/00773-Cantabria, PI11/01889-FEDER, PI12/00265, PI12/01270, and PI12/00715), by the Fundación Marqués de Valdecilla (API 10/09), by the Spanish Association Against Cancer (AECC) Scientific Foundation and by the Catalan Government DURSI grant 2009SGR1489. Samples: Biological samples were stored at the Parc de Salut MAR Biobank (MARBiobanc; Barcelona) which is supported by Instituto de Salud Carlos III FEDER (RD09/0076/00036). Also sample collection was supported by the Xarxa de Bancs de Tumors de Catalunya sponsored by Pla Director d'Oncologia de Catalunya (XBTC). MCC-Spain acknowledges the contribution from Esther Gracia-Lavedan in preparing the data. We thank all the subjects who participated in the study and all MCC-Spain collaborators. MD Anderson: Prostate Cancer Case-Control Studies at MD Anderson (MDA) supported by grants CA68578, ES007784, DAMD W81XWH-07-1-0645, and CA140388. MDACC_AS: None reported MEC: Funding provided by NIH grant U19CA148537 and grant U01CA164973. MIAMI (WFPCS): ACS MOFFITT: The Moffitt group was supported by the US National Cancer Institute (R01CA128813, PI: J.Y. Park). NMHS: Funding for the Nashville Men's Health Study (NMHS) was provided by the National Institutes of Health Grant numbers: RO1CA121060. PCaP only data: The North Carolina - Louisiana Prostate Cancer Project (PCaP) is carried out as a collaborative study supported by the Department of Defense contract DAMD 17-03-2-0052. For HCaP-NC follow-up data: The Health Care Access and Prostate Cancer Treatment in North Carolina (HCaP-NC) study is carried out as a collaborative study supported by the American Cancer Society award RSGT-08-008-01-CPHPS. For studies using both PCaP and HCaP-NC follow-up data please use: The North Carolina - Louisiana Prostate Cancer Project (PCaP) and the Health Care Access and Prostate Cancer Treatment in North Carolina (HCaP-NC) study are carried out as collaborative studies supported by the Department of Defense contract DAMD 17-03-2-0052 and the American Cancer Society award RSGT-08-008-01-CPHPS, respectively. For any PCaP data, please include: The authors thank the staff, advisory committees and research subjects participating in the PCaP study for their important contributions. For studies using PCaP DNA/genotyping data, please include: We would like to acknowledge the UNC BioSpecimen Facility and LSUHSC Pathology Lab for our DNA extractions, blood processing, storage and sample disbursement (https://genome.unc.edu/bsp). For studies using PCaP tissue, please include: We would like to acknowledge the RPCI Department of Urology Tissue Microarray and Immunoanalysis Core for our tissue processing, storage and sample disbursement. For studies using HCaP-NC follow-up data, please use: The Health Care Access and Prostate Cancer Treatment in North Carolina (HCaP-NC) study is carried out as a collaborative study supported by the American Cancer Society award RSGT-08-008-01-CPHPS. The authors thank the staff, advisory committees and research subjects participating in the HCaP-NC study for their important contributions. For studies that use both PCaP and HCaP-NC, please use: The authors thank the staff, advisory committees and research subjects participating in the PCaP and HCaP-NC studies for their important contributions. PCMUS: The PCMUS study was supported by the Bulgarian National Science Fund, Ministry of Education and Science (contract DOO-119/2009; DUNK01/2-2009; DFNI-B01/28/2012) with additional support from the Science Fund of Medical University - Sofia (contract 51/2009; 8I/2009; 28/2010). PHS: The Physicians' Health Study was supported by grants CA34944, CA40360, CA097193, HL26490, and HL34595. PHS members are grateful to the participants and staff of the Physicians' Health Study and Health Professionals Follow-Up Study for their valuable contributions, as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, and WY. PLCO: This PLCO study was supported by the Intramural Research Program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIHPLCO thanks Drs. Christine Berg and Philip Prorok, Division of Cancer Prevention at the National Cancer Institute, the screening center investigators and staff of the PLCO Cancer Screening Trial for their contributions to the PLCO Cancer Screening Trial. We thank Mr. Thomas Riley, Mr. Craig Williams, Mr. Matthew Moore, and Ms. Shannon Merkle at Information Management Services, Inc., for their management of the data and Ms. Barbara O'Brien and staff at Westat, Inc. for their contributions to the PLCO Cancer Screening Trial. We also thank the PLCO study participants for their contributions to making this study possible. Poland: None reported PROCAP: PROCAP was supported by the Swedish Cancer Foundation (08-708, 09-0677). PROCAP thanks and acknowledges all of the participants in the PROCAP study. We thank Carin Cavalli-Björkman and Ami Rönnberg Karlsson for their dedicated work in the collection of data. Michael Broms is acknowledged for his skilful work with the databases. KI Biobank is acknowledged for handling the samples and for DNA extraction. We acknowledge The NPCR steering group: Pär Stattin (chair), Anders Widmark, Stefan Karlsson, Magnus Törnblom, Jan Adolfsson, Anna Bill-Axelson, Ove Andrén, David Robinson, Bill Pettersson, Jonas Hugosson, Jan-Erik Damber, Ola Bratt, Göran Ahlgren, Lars Egevad, and Roy Ehrnström. PROGReSS: The PROGReSS study is founded by grants from the Spanish Ministry of Health (INT15/00070; INT16/00154; FIS PI10/00164, FIS PI13/02030; FIS PI16/00046); the Spanish Ministry of Economy and Competitiveness (PTA2014-10228-I), and Fondo Europeo de Desarrollo Regional (FEDER 2007-2013). ProMPT: Founded by CRUK, NIHR, MRC, Cambride Biomedical Research Centre ProtecT: Founded by NIHR. ProtecT and ProMPT would like to acknowledge the support of The University of Cambridge, Cancer Research UK. Cancer Research UK grants (C8197/A10123) and (C8197/A10865) supported the genotyping team. We would also like to acknowledge the support of the National Institute for Health Research which funds the Cambridge Bio-medical Research Centre, Cambridge, UK. We would also like to acknowledge the support of the National Cancer Research Prostate Cancer: Mechanisms of Progression and Treatment (PROMPT) collaborative (grant code G0500966/75466) which has funded tissue and urine collections in Cambridge. We are grateful to staff at the Welcome Trust Clinical Research Facility, Addenbrooke's Clinical Research Centre, Cambridge, UK for their help in conducting the ProtecT study. We also acknowledge the support of the NIHR Cambridge Biomedical Research Centre, the DOH HTA (ProtecT grant), and the NCRI/MRC (ProMPT grant) for help with the bio-repository. The UK Department of Health funded the ProtecT study through the NIHR Health Technology Assessment Programme (projects 96/20/06, 96/20/99). The ProtecT trial and its linked ProMPT and CAP (Comparison Arm for ProtecT) studies are supported by Department of Health, England; Cancer Research UK grant number C522/A8649, Medical Research Council of England grant number G0500966, ID 75466, and The NCRI, UK. The epidemiological data for ProtecT were generated though funding from the Southwest National Health Service Research and Development. DNA extraction in ProtecT was supported by USA Dept of Defense award W81XWH-04-1-0280, Yorkshire Cancer Research and Cancer Research UK. The authors would like to acknowledge the contribution of all members of the ProtecT study research group. The views and opinions expressed therein are those of the authors and do not necessarily reflect those of the Department of Health of England. The bio-repository from ProtecT is supported by the NCRI (ProMPT) Prostate Cancer Collaborative and the Cambridge BMRC grant from NIHR. We thank the National Institute for Health Research, Hutchison Whampoa Limited, the Human Research Tissue Bank (Addenbrooke's Hospital), and Cancer Research UK. PROtEuS: PROtEuS was supported financially through grants from the Canadian Cancer Society (13149, 19500, 19864, 19865) and the Cancer Research Society, in partnership with the Ministère de l'enseignement supérieur, de la recherche, de la science et de la technologie du Québec, and the Fonds de la recherche du Québec - Santé.PROtEuS would like to thank its collaborators and research personnel, and the urologists involved in subjects recruitment. We also wish to acknowledge the special contribution made by Ann Hsing and Anand Chokkalingam to the conception of the genetic component of PROtEuS. QLD: The QLD research is supported by The National Health and Medical Research Council (NHMRC) Australia Project Grants (390130, 1009458) and NHMRC Career Development Fellowship and Cancer Australia PdCCRS funding to J Batra. The QLD team would like to acknowledge and sincerely thank the urologists, pathologists, data managers and patient participants who have generously and altruistically supported the QLD cohort. RAPPER: RAPPER is funded by Cancer Research UK (C1094/A11728; C1094/A18504) and Experimental Cancer Medicine Centre funding (C1467/A7286). The RAPPER group thank Rebecca Elliott for project management. SABOR: The SABOR research is supported by NIH/NCI Early Detection Research Network, grant U01 CA0866402-12. Also supported by the Cancer Center Support Grant to the Cancer Therapy and Research Center from the National Cancer Institute (US) P30 CA054174. SCCS: SCCS is funded by NIH grant R01 CA092447, and SCCS sample preparation was conducted at the Epidemiology Biospecimen Core Lab that is supported in part by the Vanderbilt-Ingram Cancer Center (P30 CA68485). Data on SCCS cancer cases used in this publication were provided by the Alabama Statewide Cancer Registry; Kentucky Cancer Registry, Lexington, KY; Tennessee Department of Health, Office of Cancer Surveillance; Florida Cancer Data System; North Carolina Central Cancer Registry, North Carolina Division of Public Health; Georgia Comprehensive Cancer Registry; Louisiana Tumor Registry; Mississippi Cancer Registry; South Carolina Central Cancer Registry; Virginia Department of Health, Virginia Cancer Registry; Arkansas Department of Health, Cancer Registry, 4815 W. Markham, Little Rock, AR 72205. The Arkansas Central Cancer Registry is fully funded by a grant from National Program of Cancer Registries, Centers for Disease Control and Prevention (CDC). Data on SCCS cancer cases from Mississippi were collected by the Mississippi Cancer Registry which participates in the National Program of Cancer Registries (NPCR) of the Centers for Disease Control and Prevention (CDC). The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the CDC or the Mississippi Cancer Registry. SCPCS: SCPCS is funded by CDC grant S1135-19/19, and SCPCS sample preparation was conducted at the Epidemiology Biospecimen Core Lab that is supported in part by the Vanderbilt-Ingram Cancer Center (P30 CA68485). SEARCH: SEARCH is funded by a program grant from Cancer Research UK (C490/A10124) and supported by the UK National Institute for Health Research Biomedical Research Centre at the University of Cambridge. SNP_Prostate_Ghent: The study was supported by the National Cancer Plan, financed by the Federal Office of Health and Social Affairs, Belgium. SPAG: Wessex Medical ResearchHope for Guernsey, MUG, HSSD, MSG, Roger Allsopp STHM2: STHM2 was supported by grants from The Strategic Research Programme on Cancer (StratCan), Karolinska Institutet; the Linné Centre for Breast and Prostate Cancer (CRISP, number 70867901), Karolinska Institutet; The Swedish Research Council (number K2010-70X-20430-04-3) and The Swedish Cancer Society (numbers 11-0287 and 11-0624); Stiftelsen Johanna Hagstrand och Sigfrid Linnérs minne; Swedish Council for Working Life and Social Research (FAS), number 2012-0073STHM2 acknowledges the Karolinska University Laboratory, Aleris Medilab, Unilabs and the Regional Prostate Cancer Registry for performing analyses and help to retrieve data. Carin Cavalli-Björkman and Britt-Marie Hune for their enthusiastic work as research nurses. Astrid Björklund for skilful data management. We wish to thank the BBMRI.se biobank facility at Karolinska Institutet for biobank services. PCPT & SELECT are funded by Public Health Service grants U10CA37429 and 5UM1CA182883 from the National Cancer Institute. SWOG and SELECT thank the site investigators and staff and, most importantly, the participants who donated their time to this trial. TAMPERE: The Tampere (Finland) study was supported by the Academy of Finland (251074), The Finnish Cancer Organisations, Sigrid Juselius Foundation, and the Competitive Research Funding of the Tampere University Hospital (X51003). The PSA screening samples were collected by the Finnish part of ERSPC (European Study of Screening for Prostate Cancer). TAMPERE would like to thank Riina Liikanen, Liisa Maeaettaenen and Kirsi Talala for their work on samples and databases. UGANDA: None reported UKGPCS: UKGPCS would also like to thank the following for funding support: The Institute of Cancer Research and The Everyman Campaign, The Prostate Cancer Research Foundation, Prostate Research Campaign UK (now Prostate Action), The Orchid Cancer Appeal, The National Cancer Research Network UK, The National Cancer Research Institute (NCRI) UK. We are grateful for support of NIHR funding to the NIHR Biomedical Research Centre at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust. UKGPCS should also like to acknowledge the NCRN nurses, data managers, and consultants for their work in the UKGPCS study. UKGPCS would like to thank all urologists and other persons involved in the planning, coordination, and data collection of the study. ULM: The Ulm group received funds from the German Cancer Aid (Deutsche Krebshilfe). WUGS/WUPCS: WUGS would like to thank the following for funding support: The Anthony DeNovi Fund, the Donald C. McGraw Foundation, and the St. Louis Men's Group Against Cancer.
Study
phs001391
Kids First: Genomic Studies of Orofacial Cleft Birth Defects
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). Both childhood cancers and structural birth defects are critical and costly conditions associated with substantial morbidity and mortality. Elucidating the underlying genetic etiology of these diseases has the potential to profoundly improve preventative measures, diagnostics, and therapeutic interventions. WGS and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed. The Kids First study of nonsyndromic orofacial cleft birth defects (OFCs) is a whole genome sequencing study of 415 White parent-case trios drawn from ongoing collaborations led by Dr. Mary L. Marazita of the University of Pittsburgh Center for Craniofacial and Dental Genetics, including collaborations with Dr. George Wehby of the University of Iowa, Dr. Jacqueline Hecht of the University of Texas, and Dr. Terri Beaty of Johns Hopkins University. Sequencing was done by the Washington University McDonell Genome Institute. The case in each of the Kids First trios has cleft lip (CL, Figure A), cleft palate (CP, Figure B), or both (CL+CP, Figure C): OFCs are genetically complex structural birth defects caused by genetic factors, environmental exposures, and their interactions. OFCs are the most common craniofacial anomalies in humans, affecting approximately 1 in 700 newborns, and are one of the most common structural birth defects worldwide. On average a child with an OFC initially faces feeding difficulties, undergoes 6 surgeries, spends 30 days in hospital, receives 5 years of orthodontic treatment, and participates in ongoing speech therapy, leading to an estimated total lifetime treatment cost of about $200,000. Further, individuals born with an OFC have higher infant mortality, higher mortality rates at all other stages of life, increased incidence of mental health problems, and higher risk for other disorders (notably including breast, brain, and colon cancers). Prior genome-wide linkage and association studies have now identified at least 18 genomic regions likely to contribute to the risk for nonsyndromic OFCs. Despite this substantial progress, the functional/pathogenic variants at OFC-associated regions are mostly still unknown. Because previous OFC genomic studies (genome-wide linkage, genome-wide association studies (GWAS), and targeted sequencing) are based on relatively sparse genotyping data, they cannot distinguish between causal variants and variants in linkage disequilibrium with unobserved causal variants. Moreover, it is unknown whether the association or linkage signals are due to single common variants, haplotypes of multiple common variants, clusters of multiple rare variants, or some combination. Finally, we cannot yet attribute specific genetic risk to individual cases and case families. Therefore, the goal of the current study is to identify specific OFC risk variants in Whites by performing whole genome sequencing of parent-case trios.
Study
phs001168
Bleomycin Induced Pneumonitis cohort of Exceptional Responders Program
This DAC handles access to data from the Bleomycin Induced Pneumonitis cohort of the Exceptional Responders Program of the Garvan Institute of Medical Research (https://www.garvan.org.au/research/collaboration/exceptional-responders-program)
Dac
EGAC50000000791
DAC for lymphoma IFZ Essen
This DAC is set in place to handle acess to data on lymphomas and immune cells submitted by scientists from the Institute of Cell Biology (Cancer Research) at the University of Duisburg-Essen in Essen, Germany.
Dac
EGAC50000000521
Whole Exome Sequencing of a Lung Adenocarcinoma Patient Across Three Time Points
A 65-year-old female patient with a significant smoking history (40 pack-years) was first diagnosed with lung adenocarcinoma (LUAD) in the left lung in May 2020 (L1-T1). One year later, in May 2021, the patient progressed with a new LUAD in the right lung (L1-T2) and in May 2024, another LUAD emerged in the right lung (L1-T3). Whole Exome Sequencing (WES) analysis was conducted on three tumor biopsies obtained at different time points from the same patient to investigate clonal trajectories.
Study
EGAS50000000812