Phylogenetic Analyses of Melanoma Reveal Complex Patterns of Metastatic Dissemination
Subpopulations of cells in a primary melanoma can disseminate and establish metastases. Still, the precise ancestral relationship between primary tumors and their metastases is not well understood. Using whole-exome sequencing (for discovery) and targeted sequencing (for validation), we analyzed mutation patterns of primary melanomas and two or more metastases in each of 8 patients to determine their phylogenetic relationships, profiling a total of 31 total tumors. The resulting data show that in 6 of 8 cases, genetically unique cell populations in the primary metastasized in parallel to distinct anatomic sites, rather than sequentially. These data also indicate that individual metastases were sometimes founded by multiple cell populations of the primary that were genetically distinct.
Study
phs000941
MOSAIC - Multi-Omics Spatial Atlas In Cancer
MOSAIC is a collaborative initiative founded by Owkin, Lausanne University Hospital (CHUV), Charité Universitätsmedizin Berlin, University Hospital Erlangen (UKER), Gustave Roussy Institute in Paris, and University of Pittsburgh. The goal of MOSAIC is to build the largest collection of spatial omics data in cancer. By integrating comprehensive high quality clinical annotations with advanced deep profiling techniques, MOSAIC aims to uncover novel cancer subtypes and identify key drug targets and biomarkers within them.
Study
EGAS50000000689
TMM whole genome analysis of 4566 Japanese individuals
Tohoku University Tohoku Medical Megabank Organization (ToMMo) and Iwate Tohoku Medical Megabank Organization (IMM) were founded to establish an advanced medical system to foster the reconstruction from the Great East Japan Earthquake. These organizations are developing a biobank that includes medical and genome information for supporting health and welfare in the Tohoku area. In the first stage, the part of our mission was to sequence the 4,000 individuals to construct Japanese whole-genome reference panel.
Study
JGAS000239
NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry (VU_AF)
The Vanderbilt Atrial Fibrillation (AF) Registry was founded in 2001. Patients with AF and family members are prospectively enrolled. At enrollment a detailed past medical history is obtained along with an AF symptom severity assessment. Blood samples are obtained for DNA extraction. Patients are followed longitudinally along with serial collection of AF symptom severity assessments.
Study
phs001032
NHLBI TOPMed - NHGRI CCDG: The Vanderbilt AF Ablation Registry
The Vanderbilt Atrial Fibrillation Ablation Registry (VAFAR) was founded in 2011. Patients with AF referred for AF ablation are prospectively enrolled. A detailed clinical history is recorded, along with imaging data (cardiac MRI or CT). Blood samples are obtained for DNA extraction at the time of ablation. Details of the ablation procedure are recorded. Patients are longitudinally followed to monitor for AF recurrence. VAFAR contributed 171 samples submitted to dbGaP for WGS: 115 were from male subjects, of which 113 were white/non-Hispanic and 2 were Hispanic; 56 were from females, of which all 56 were white/non-Hispanic.
Study
phs000997
Repertoire and clinical hierarchy of AR locus alterations in castration-resistant prostate cancer
Somatic alterations to the androgen receptor (AR) gene are pivotal drivers of treatment resistance in metastatic castration-resistant prostate cancer (mCRPC), but their prevalence, clinical impact, and etiology remain incompletely understood. Here, we analyzed 3399 plasma cell-free DNA samples and 1988 leukocyte DNA samples from 1995 metastatic prostate cancer patients with matched clinical data and extensive AR locus coverage.
Study
EGAS50000001101
PELICAN33 longitudinal clinical and autopsy phenomic assessment in lethal metastatic prostate cancer
Objective: To identify novel phenotypes and features associated with metastatic prostate cancer (mPC) outcome, and to identify biomarker and data requirements to be tested in future precision oncology trials.
Design, Setting, and Participants: We analyzed deep longitudinal clinical, neuroendocrine expression, and autopsy data of 33 men who died from mPC between 1995 and 2004 (PELICAN33), and related findings to mPC biomarkers reported in the literature.
Intervention: 33 men prospectively consented to participate in an integrated clinical-molecular rapid autopsy study of mPC.
Study
EGAS00001005399
Genetic Epidemiology Network of Arteriopathy (GENOA)
The Genetic Epidemiology Network of Arteriopathy (GENOA): GENOA is one of four research networks that form the NHLBI Family Blood Pressure Program (FBPP). From its inception in 1995, GENOA's long-term objective was to elucidate the genetics of hypertension and its arteriosclerotic target-organ damage, including both atherosclerotic (macrovascular) and arteriolosclerotic (microvascular) complications involving the heart, brain, kidneys, and peripheral arteries. Two GENOA cohorts were originally ascertained (1995-2000) through sibships in which at least 2 siblings had essential hypertension diagnosed prior to age 60 years. All siblings in the sibship were invited to participate, both normotensive and hypertensive. These include non-Hispanic White Americans from Rochester, MN (n =1583 at the 1st exam) and African Americans from Jackson, MS (N=1854 at the 1st exam). During the second exam (2000-2005), approximately 80% of participants were re-recruited. The GENOA data consists of biological samples (DNA, serum, urine) as well as demographic, anthropometric, environmental, clinical, biochemical, physiological, and genetic data for understanding the genetic predictors of diseases of the heart, brain, kidney, and peripheral arteries. Family Blood Pressure Program (FBPP): GENOA's parent program, the FBPP, is an unprecedented collaboration to identify genes influencing blood pressure (BP) levels, hypertension, and its target-organ damage. This program has conducted over 21,000 physical examinations, assembled a shared database of several hundred BP and hypertension-related phenotypic measurements, completed genome-wide linkage analyses for BP, hypertension, and hypertension associated risk factors and complications, and published over 130 manuscripts on program findings. The FBPP emerged from what was initially funded as four independent networks of investigators (HyperGEN, GenNet, SAPPHIRe and GENOA) competing to identify genetic determinants of hypertension in multiple ethnic groups. Realizing the greater likelihood of success through collaboration, the investigators began working together during the first funding cycle (1995-2000) and formalized this arrangement in the second cycle (2000-2005), creating a single confederation with program-wide and network-specific goals.
Study
phs000379
San Francisco Bay Area Latina Breast Cancer Study
The genome-wide association study (GWAS) includes participants enrolled into two different studies. The first study, the San Francisco Bay Area Cancer Study (SFBCS) is a population-based case-control study of breast cancer conducted in the San Francisco Bay Area and included women ages 35-79 years from three racial/ethnic groups: Non-Hispanic whites, African Americans, and Hispanics/Latinas. For the GWAS, only Hispanic/Latina women were included. Women diagnosed with invasive breast cancer between 1995 and 2002 were identified through the Greater Bay Area Cancer Registry. Controls were identified by random digit dialing and were frequency-matched to cases by age in 5 year increments and by race/ethnicity. Hispanic/Latina ethnicity was assessed by self-report. 175 Hispanic/Latina cases and 307 Hispanic/Latina controls from the SFBCS had given adequate consent and provided biospecimens that were used in the GWAS to be included in this data submission. The second study is the Northern California site of the Breast Cancer Family Registry (NC-BCFR). This population-based family study recruited breast cancer cases ages 18-64 years diagnosed from 1995-2009 that were identified through the Greater Bay Area Cancer Registry. Cases included all women at increased genetic susceptibility for breast cancer who met one or more of the following criteria: (a) being diagnosed with breast cancer at age <35 years; b) having a personal history of ovarian cancer or childhood cancer; (c) being diagnosed with two different breast cancers (bilateral breast cancers), with the first one diagnosed at age <50 years; and d) having one or more first-degree relatives with breast cancer, ovarian cancer or childhood cancer. Cases not meeting these criteria were randomly sampled and racial/ethnic minorities were oversampled. Controls were recruited by random digit dialing and were matched by 5-year age increments and by race/ethnicity. For the current GWAS only Latina/Hispanic cases and controls were included. Latina/Hispanic ethnicity was assessed by self-report. 631 Hispanic/Latina cases and 61 Hispanic/Latina controls from the NC-BCFR had given adequate consent and provided biospecimens that were used in the GWAS to be included in this data submission.
Study
phs000912
Genetic Epidemiology of Refractive Error in the KORA (Kooperative Gesundheitsforschung in der Region Augsburg) Study
KORA ("Kooperative Gesundheitsforschung in der Region Augsburg" which translates as "Cooperative Health Research in the Region of Augsburg") is a population based study of adults randomly selected from 430,000 inhabitants living in Augsburg and 16 surrounding counties in Germany. The collection was done in 4 separate groups from 1984-2001 (S1-S4). One of the KORA groups, S3/F3, will be utilized for our GWAS because it is the only group with refractive error (RE) measurements. Consequent to informed consent, each of the surveys sampled subjects from ten strata according to age (range 25-74 years) and sex (equal ratio) with a minimum stratum size of > 400 subjects. In the KORA S3 study 4,856 subjects were studied between 1994 and 1995, and 3,006 individuals from S3 returned for follow up between 2003 and 2005 (S3/F3). For this refractive error study, we are including 1,981 subjects from S3/F3 (mean age 55.7, range 35-84). For each subject, eyeglass prescriptions were measured in addition to an evaluation by the Nikon Retinomax. Subjects with predisposing medical conditions, i.e., connective tissue disorders, and ocular conditions i.e., cataract and corneal opacities, that might predispose them to refractive error will not be included for genotyping. Whole genome association genotyping will be performed to determine common alleles that contribute to the variation of the quantitative trait of refractive error.
Study
phs000303
Genome-Wide Association Study of Preterm Birth
Preterm birth (PTB, born before 37 weeks of gestation) is a leading cause of neonatal mortality and post-natal morbidity. PTB affects one in nine all live births in the U.S. Notably, the highest rate of PTB occurs among African Americans (one in six). PTB is a complex trait, likely determined by multiple environmental and genetic factors and their interactions. We demonstrated strong familial aggregation of preterm and low birthweight in the US Blacks and Whites (Wang et al, NEJM, 1995) and conducted the largest candidate gene study of preterm birth at that time (Hao et al, HMG, 2004). We showed that a subset of mothers with certain metabolic gene variants are particularly vulnerable to the adverse effects of cigarette smoking on low birthweight and preterm births (Wang et al, JAMA, 2002). We also published a number of papers that examined the effect of maternal pre-pregnancy BMI, micronutrient status, stress and environmental toxins on the risk of preterm birth and related conditions. This project, supported by a grant from the NICHD (2R01HD41702, PI, Xiaobin Wang), aimed to conduct a genome-wide association study (GWAS) and apply advanced statistical methods to identify susceptibility loci of PTB in a predominantly urban low-income African American sample, a subset of the Boston Birth Cohort. PUBLIC HEALTH REVELANCE: We anticipate that this study will lead to the identification of novel genetic loci of PTB and gene-environment interactions. Such findings not only will provide important insights into mechanisms leading to PTB, but also may help identify women at high-risk of PTB, which in turn, may lead to the development of early and targeted interventions that can prevent PTB or mitigate the severity and consequences of PTB.
Study
phs000332
Targeted Illumina sequencing of 3399 plasma cfDNA samples and 1988 leukocyte DNA samples from 1995 metastatic prostate cancer patients
Blood samples were collected from eight clinical trials and a British Columbia-based biobank. All plasma cfDNA and leukocyte DNA samples were processed with uniform methodology, and underwent targeted sequencing using a custom hybridization capture panel and Illumina instruments. Four different generations of custom hybridization capture panel were used, all providing coverage of the complete coding regions of 72 prostate cancer genes, and later panel generations also providing exhaustive coverage of the AR locus (including introns and flanking regulatory regions), as well as a genome-wide SNP grid for copy number analysis.
Dataset
EGAD50000001594
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Coronary Artery Risk Development in Young Adults Study (CARDIA)
Cohort Description The Coronary Artery Risk Development in Young Adults (CARDIA) study is a study examining the development and determinants of clinical and subclinical cardiovascular disease and their risk factors. It began in 1985-6 with a group of 5115 black and white men and women aged 18-30 years. The participants were selected so that there would be approximately the same number of people in subgroups of race, gender, education (high school or less and more than high school), and age (18-24 and 25-30 years) in each of 4 centers: Birmingham, AL; Chicago, IL; Minneapolis, MN; and Oakland, CA. These same participants were asked to participate in follow-up examinations during 1987-1988 (Year 2), 1990-1991 (Year 5), 1992-1993 (Year 7), 1995-1996 (Year 10), 2000-2001 (Year 15), 2005-2006 (Year 20), 2010-2011 (Year 25), 2015-2016 (Year 30), and 2021-2022 (Year 35). Data Being Submitted Wave 1 questionnaire data includes 397 variables for up to 2434 CARDIA participants in C4RWave 2 questionnaire data includes 448 variables for up to 1901 CARDIA participants in C4RDried Blood Spot/Serosurvey data includes 7 variables for up to 1332 CARDIA participants in C4RDerived data includes 43 variables for up to 2723 CARDIA participants in C4RPhenotype data includes 113 variables for up to 2723 CARDIA participants in C4R
Study
phs003045
California Teachers Study (CTS): Whole Genome Sequences From Under-Represented Populations
The California Teachers Study (CTS) is a prospective observational cohort of N=133,477 adult women who have been followed since 1995. Participants completed up to six surveys and have been linked with cancer, hospitalization, and mortality data. In 2013-2016, over 14,000 CTS participants donated blood samples for future research. For this project, DNA was extracted in 2023 from 1694 buffy coats and 56 blood clots from N=1750 CTS participants whose self-reported race/ethnicity was Asian (398), Asian or Pacific Islander (18), Black or African American (189), Native American (128), Native Hawaiian or Other Pacific Islander (66), White Hispanic (719), or more than one race (232). Whole genome sequencing (WGS) was performed at CIDR in 2023-2024. WGS data and accompanying phenotype data are available through dbGaP. In addition, all CTS data are available to the research community via the CTS Researcher Platform (www.calteachersstudy.org), a secure cloud-based environment that includes data, software, documentation, and collaboration workspaces.
Study
phs002918
NHLBI TOPMed - NHGRI CCDG: The BioMe Biobank at Mount Sinai
The IPM BioMe Biobank, founded in September 2007, is an ongoing, broadly-consented electronic health record (EHR)-linked clinical care biobank that enrolls participants non-selectively from the Mount Sinai Medical Center patient population. BioMe currently comprises >42,000 participants from diverse ancestries, characterized by a broad spectrum of longitudinal biomedical traits. Participants are enrolled through an opt-in process and consent to be followed throughout their clinical care (past, present, and future) in real-time, allowing us to integrate their genomic information with their EHRs for discovery research and clinical care implementation. BioMe participants consent for recall, based on their genotype and/or phenotype, permitting in-depth follow-up and functional studies for selected participants at any time. Phenotypic and genomic data are stored in a secure database and made available to investigators, contingent on approval by the BioMe Governing Board. BioMe uses a "data-broker" system to protect confidentiality. Ancestral diversity - BioMe participants represent a broad racial, ethnic and socioeconomic diversity with a distinct and population-specific disease burden. Specifically, BioMe participants are of African (AA), Hispanic/Latino (HL), European (EA) and other/mixed ancestry. BioMe participants are predominantly of African (AA, 24%), Hispanic/Latino (HL, 35%), European (EA, 32%), and other ancestry (OA, 10%). Participants who self-identify as Hispanic/Latino further report to be of Puerto Rican (39%), Dominican (23%), Central/South American (17%), Mexican (5%) or other Hispanic (16%) ancestry. More than 40% of European ancestry participants are genetically determined to be of Ashkenazi Jewish ancestry. With this broad ancestral diversity, BioMe is uniquely positioned to examine the impact of demographic and evolutionary forces that have shaped common disease risk. Phenotypes available in BioMe - BioMe has a high-quality and validated set of fully implemented clinical phenotype data that has been culled by a multi-disciplinary team of experienced investigators, clinicians, information technologists, data-managers, and programmers who apply advanced medical informatics and data mining tools to extract and harmonize EHRs. BioMe, as a cohort, offers a great versatility for designing nested case-control sample-sets, particularly for studying longitudinal traits and co-morbidity in disease burden. Biomedical and clinical outcomes: The BioMe Biobank is linked to Mount Sinai's system-wide Epic EHR, which captures a full spectrum of biomedical phenotypes, including clinical outcomes, covariate and exposure data from past, present and future health care encounters. As such, the BioMe Biobank has a longitudinal design as participants consent to make all of their EHR data from past (dating back as far as 2003), present and future inpatient or outpatient encounters available for research, without restriction. The median number of outpatient encounters is 21 per participant, reflecting predominant enrollment of participants with common chronic conditions from primary care facilities. Environmental data: The clinical and EHR information is complemented by detailed demographic and lifestyle information, including ancestry, residence history, country of origin, personal and familial medical history, education, socio-economic status, physical activity, smoking, dietary habits, alcohol intake, and body weight history, which is collected in a systematic manner by interview-based questionnaire at time of enrollment. The IPM BioMe Biobank contributed ~10,600 DNA samples for whole genome sequencing to the TOPMed program. Samples were selected for the Coronary Artery Disease (CAD) and the Chronic Obstructive Pulmonary Disease (COPD) working groups. Using a Case-Definition-Algorithm (CDA), we identified ~4,100 individuals with CAD (~50% women) and ~3,000 individuals as controls (65% women). In addition, we identified ~800 individuals with COPD (62% women) and 1800 individuals as controls (72% women). Another 600 BioMe participants with Atrial Fibrillation, all of African ancestry, were included.
Study
phs001644
ALS Compute
Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disorder characterized by the progressive loss of brain and spinal cord motor neurons. Half of ALS patients display cognitive symptoms of frontotemporal dementia (FTD); reciprocally, about 40% of FTD patients show motor neuron deficits, and approximately 15% develop overt ALS. The clinical overlap between ALS and FTD means that the two conditions are thought to represent a disease spectrum (ALS/FTD). In recent years, the identification of several genetic causes of ALS/FTD has contributed significantly to our understanding of disease pathogenesis. Unfortunately, one-third of the underlying genetic causes of familial cases and ~90% of sporadic cases of ALS/FTD remain unexplained. As such, there is a dire need to identify additional genetic factors contributing to ALS/FTD. Such studies require huge cohorts of harmonized whole genome sequencing (WGS) data sets from cases and controls. Currently, there are several major ongoing sequencing efforts for ALS patients. Numerous centers lead to inefficiency, especially in terms of overall costs. At a minimum, this includes the cost of high-performance computing, the storage of large data files, and the duplication of effort between groups. This lack of data harmonization between groups precludes sharing of genetic information and weakens collaborative efforts. The cost and logistics are also a barrier to attracting talented investigators to the ALS/FTD field. To overcome this unmet need, we have founded the ALS Compute project. We are centralizing the storage of ALS/FTD WGS data from every significant sequencing effort in the United States and beyond within a single Cloud environment. This approach will facilitate data harmonization and improve accessibility to the data. To accomplish this, we have made the data and the computational infrastructure available via the Terra platform hosted by NHGRI's Genomic Data Science Analysis, Visualization, and Informatics Lab-Space (AnVIL). This will allow researchers worldwide to access this wealth of data, develop new theories of the disease, and yield breakthroughs in our understanding of ALS/FTD.
Study
phs003184
Large Scale Genotyping of Psychiatric Disorders
Psychiatric disorders cause enormous human suffering and cost to society. The goal of this project is to elucidate the genetics of psychiatric disorders through a large consortium effort and genome-wide genotyping. The Psychiatric Genomics Consortium (PGC) (https://www.med.unc.edu/pgc) is an international consortium to pursue genome-wide analyses of common psychiatric disease. The PGC was founded in 2007 to unite investigators from around the world to conduct mega-analyses of individual genome-wide genotype data for schizophrenia (SCZ), bipolar disorder (BIP), autism (AUT), attention-deficit hyperactivity disorder (ADHD), and major depressive disorder (MDD). Subsequently, the PGC has expanded to include Tourette syndrome (TS), obsessive-compulsive disorder (OCD), eating disorders (ED), post-traumatic stress disorder (PTSD), and substance abuse (SUD). The PGC partnered with Illumina to develop the PsychChip, a custom GWAS array aimed at capturing common variation genome-wide for imputation, rare coding variation drawn from the exome chip and dense genotypes for loci with suggestive evidence for association across the initial PGC analyses of SCZ, MDD, BPD, ASD and ADHD. Samples (cases, controls, and families) were collected from Institutional Review Board-approved protocols that aim to study the genetics of psychiatric disorders. Genome-wide genotyping was performed at the Broad Institute (Cambridge, MA, USA) and the Mount Sinai School of Medicine Friedman Brain Institute (New York, NY, USA) using the PsychChip or Exome Chip.
Study
phs001413
Rare Genetic Steroid Disease Consortium (GSD) Apparent Mineralocorticoid Excess (AME) Syndrome Natural History Clinical Protocol
The objective of this project is to observe the natural history of apparent mineralocorticoid excess (AME). AME is a rare monogenic hypertensive disease first described by the Principal Investigator, Dr. Maria I. New, in 1977 [1], for which the first mutation in the causative HSD11B2 gene was reported by her group in 1995. [2] Little is known about the progression of this rare hypertensive disorder, which is defined by the presence of hypertension in conjunction with low renin, low to absent levels of the mineralocorticoid aldosterone, elevated urinary cortisol/cortisone metabolite ratio 5β-tetrahydrocortisol (THF) + allo-THF/ tetrahydrocortisone (THE) (i.e., THF+αTHF/THE) and the presence of mutations in the HSD11B2 gene on both alleles. The range of symptoms reflecting progressive end-organ damage is wide in the small number of affected individuals studied to date. It is important to develop accurate information on the longitudinal pattern of progression (natural history) of AME not only because it may improve the care of affected individuals, but also because it may serve as a platform for understanding the role of mineralocorticoid excess in other forms of low-renin hypertension, which constitute 40% of all forms of hypertension. In this study, we will recruit as widely as possible through the Rare Diseases Clinical Research Network (RDCRN) to identify individuals with the triad of low renin, low aldosterone, and elevated urinary THF+αTHF/THE and confirm their disorder as AME by molecular genetic testing of the HSD11B2 gene. We will gather clinical and biochemical data yearly for the duration of the grant. We will describe phenotype and genotype at diagnosis and follow the progression or regression of symptoms over time under standard treatment, with rigorous review of end organs potentially damaged by chronic hypertension. Finally, we will study the natural history of the carriers of HSD11B2 mutations, i.e., the heterozygote relatives of the probands, in order to ascertain evidence of existing or developing hypertension and resulting end-organ damage. This will be the first rigorous study of AME heterozygotes. The participant accrual for AME-affected patients and the carriers of HSD11B2 mutations will be at least two years, but as long as seven years.
Study
phs000603
Exome_trios_in_patients_with_gastroschisis
Gastroschisis (MIM 230750) is a herniation of the intestines through a defect of the abdominal wall lateral to the umbilicus (usually on the right side), and it is not covered by a membrane [Ledbetter, 2012]. Gastroschisis is a congenital anomaly with increasing incidence, easy prenatal diagnosis and extremely variable postnatal outcomes. On the basis of clinical manifestations, epidemiologic charateristics, and the presence and type of additional malformations, gastroschisis could be considered a heterogeneous condition with no gene/s discovered yet.
This congenital anomaly affects approximately 1-3 infancts per 10,000 live births [Calzolari et al.1995;Parker et al.,2010] Current knowledge about causative mutations/variants. To date, no single gene has been linked to gastroschisis. Some publications have tried to link this malformation to variants in genes (such as AEBP1 (adipocyte enhancer binding protein) gene [Feldkamp et al,. 2012] or the VEGF-NOS3 pathway [Lammer et al., 2008].
Previously, a Scribble mutant mouse model (circletail) was reported to exhibit gastroschisis, however recent studies demonstrated that the Scribble knockout fetus exhibits exomphalos phenotype of gastroschisis [Carnagham et al., 2013].
This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/
Study
EGAS00001002664
Autism Sequencing Consortium (ASC)
The ARRA Autism Sequencing Collaboration was created in 2010 bringing together expert large-scale sequencing center (at the Baylor College of Medicine, PI Richard Gibbs and the Board Institute of MIT and Harvard, PI Mark J. Daly) and a collaborative network of research labs focused on the genetics of autism (brought together by the Autism Genome Project and the Autism Consortium). These groups worked together to utilize dramatic new advances in DNA sequencing technology to reveal the genetic architecture of autism through comprehensive examination of the exotic sequence of all genes. The Autism Sequencing Consortium (ASC) was founded by Joseph D. Buxbaum and colleagues as an international group of scientists who share autism spectrum disorder (ASD) samples and genetic data. The PIs are Drs. Joseph D. Buxbaum (Icahn School of Medicine at Mount Sinai), Mark J. Daly (Broad Institute of MIT and Harvard), Bernie Devlin (University of Pittsburgh School of Medicine), Kathryn Roeder (Carnegie Mellon University, Matthew State and Stephan Sanders (University of California, San Francisco). The rationale for the ASC is described in Buxbaum et al. 2012, and this paper should be cited when referencing the data set. All shared data and analysis is hosted at a single site, which enables joint analysis of large-scale data from many groups. The ASC was first supported by a cooperative agreement grant to four lead sites funded by the National Institute of Mental Health (U01MH100233, U01MH100209, U01MH100229, U01MH100239), with additional support from the National Human Genome Research Institute. The NIMH recently renewed their support with a second grant (U01MH111661, U01MH111660, U01MH111658 and U01MH111662) to expand the project from 29,000 genomes to more than 50,000 exomes over the next 5 years. NHGRI provides ongoing sequencing support for the ASD through the Broad Center for Common Disease Genomics (UM1HG008895, Mark Daly, PI).
Study
phs000298
NHLBI Family Heart Study (FamHS-Visit1 and FamHS-Visit2)
The Family Heart Study (FamHS) was funded by the National Heart, Lung, and Blood Institute (NHLBI). It was begun in 1992 with the ascertainment of 1,200 families, half randomly sampled, and half selected because of an excess of coronary heart disease (CHD) or risk factor abnormalities as compared with age- and sex-specific population rates (Higgins et al. 1996). The families, with approximately 6,000 individuals, were sampled on the basis of information on probands from four population-based parent studies: the Framingham Heart Study, the Utah Family Tree Study, and two Atherosclerosis Risk in Communities (ARIC) centers (Minneapolis, and Forsyth County, NC). A broad range of phenotypes were assessed at a clinic examination in broad domains of CHD, atherosclerosis, cardiac and vascular function, inflammation and hemostasis, lipids and lipoproteins, blood pressure, diabetes and insulin resistance, pulmonary function, and anthropometry (FamHS Visit 1). Approximately 8 years later, study participants belonging to the largest pedigrees were invited for a second clinical exam (FamHS Visit 2). A total of 2,756 Caucasian subjects in 508 extended families were examined. A two-phase design was adopted for the genome wide association (GWA) study. In phase-1, 1007 subjects were chosen, equally distributed between the upper and lower quartile of age- and sex-adjusted values for coronary artery calcification, assessed by CT scan in Visit 2. These subjects were chosen to be largely unrelated; 34% of the subjects were from unique families, while 200 other subjects had 1 or more siblings selected into the sample, yielding a sample of 465 unrelated subjects. The remaining family members (N=1749) were genotyped in the phase-2 for replication of the top hits from the phase-1. The results presented here represent those for the analysis of the phase-1 case-control sample for variables assessed in FamHS Visit 1 (from 1992 to 1995) and for the variables assessed in FamHS Visit 2 (from 2002 to 2003). All subjects were typed on the Illumina HumMap 550 chip (Phase 1 genotype). Of these, 33 (3.3%) were excluded due to technical errors, call rates below 98%, and discrepancies between reported sex and sex-diagnostic markers. The final sample of 974 subjects have Visit 2 phenotypes, approximately 100 of these do not have Visit 1 phenotypes. There was no significant plate-to-plate variation in allele frequencies. The covariate adjustments were performed separately by sex using cubic polynomial age and clinical centers, and retaining the terms in the stepwise regression analysis that were significant at the 5% level. Extreme outliers (>4 SD from the mean) were set aside, temporarily, for the adjustments. The final phenotypes were computed for all individuals using the best mean regression models and standardizing to 0 mean and unit variance. The FamHS has contributed GWA results in many phenotype domains (antropometric and adiposity, atherosclerosis and coronary heart disease, lipid profile, diabetes and glicemic traits, metabolic syndrome etc) to meta-analyses and various consortia, including Heard-Costa et al. 2009, Köttgen et al. 2010, Teslovich et al. 2010, Nettleton et al. 2010, Lango et al. 2010, Heid et al. 2010, Speliotes et al. 2010, Dupuis et al. 2010, Kraja et al. 2011.
Study
phs000221
NHLBI TOPMed: Coronary Artery Risk Development in Young Adults (CARDIA)
CARDIA is a study examining the etiology and natural history of cardiovascular disease beginning in young adulthood. In 1985-1986, a cohort of 5115 healthy black and white men and women, aged 18-30 years, were selected to have approximately the same number of people in subgroups of age (18-24 and 25-30), sex, race, and education (high school or less, and more than high school) within each of four US Field Centers. These same participants were asked to participate in follow-up examinations during 1987-1988 (Year 2), 1990-1991 (Year 5), 1992-1993 (Year 7), 1995-1996 (Year 10), 2000-2001 (Year 15), 2005-2006 (Year 20), 2010-2011 (Year 25) and 2015-2016 (Year 30). In addition to the follow-up examinations, participants are contacted regularly for the ascertainment of information on out-patient procedures and hospitalizations experienced between contacts. Within the past five years, 95% of the original surviving cohort has been contacted. While the specifics of each examination have differed somewhat, data have been collected on a variety of factors believed to be related to heart disease. These include conditions with clear links to heart disease, such as blood pressure, cholesterol and other lipids. Data have also been collected on physical measurements, such as weight and skinfold fat, as well as lifestyle factors such as substance use (tobacco and alcohol), dietary and exercise patterns, behavioral and psychological variables, medical and family history, and other chemistries (e.g., insulin and glucose). In addition, subclinical atherosclerosis was measured via echocardiography during Years 5, 10, and 25, computed tomography during Years 15 and 20, and carotid ultrasound during Year 20.Comprehensive phenotypic data for study participants are available through dbGaP phs000285.
Study
phs001612
(Epi)genetic Risk Architectures of Opioid-Dependent Brain
With nearly 20,000 overdose deaths in 2014 alone, opioid addiction (OA) has emerged as one of the most pressing public health crises in recent US history. One-fifth of individuals who try heroin develop an addiction to opioids. Genetics is a major contributor to OA with an estimated 60% heritability, only somewhat less than schizophrenia (80%) which has recently seen substantial gains in identified underlying genetics. Yet, studies to date have failed to uncover most of the genes that predispose individuals to OA, leaving the overwhelming fraction of OA heritability unexplained. The proposed study takes a novel, integrated "omics-based" strategy to investigate the molecular basis of OA and uncover both genetic and epigenetic factors associated with opioid addiction. The premise for our approach is founded on studies from our labs, and others, implicating regulatory variation in common traits and diseases, including those associated with complex brain phenotypes like addiction. We have collected the largest known cohort of postmortem brains from addicts who overdosed on opioids, along with matched control brains from non-users. From both cases and control specimens, we will isolate cells from 2 regions of the brain closely implicated in the addiction phenotype: the Prefrontal Cortex (PFC) and the Nucleus Accumbens (NAc). In Aim 1, we propose ChIP-seq studies to identify regulatory elements that distinguish cases from controls and define the opioid addiction phenotype. The regulatory differences will be connected to their gene targets through high resolution in situ Hi-C. In Aim 2, we propose QTL-based approaches to identify genetic variants that underlie the regulatory differences (hmQTLs). We also propose eQTL analyses to identify genetic variants that underlie differences in transcript levels between cases and controls. Aim 3 leverages the largest heroin addiction GWAS meta-analysis to date to test the hypothesis that SNPs in regions associated with chromatin and expression differences between cases and controls define novel loci for predisposition to OA. Each Aim has the potential for discovery independent of the others (differential HM, RNAexp, QTLs, and variant-phenotype associations) but their synergy is the most powerful component of the proposed study: identifying regulatory pathways that generate, not only phenotype associations, but hypothesized mechanisms for those associations which can be the focus of new opioid addiction prevention and treatment development.
Study
phs002724
Perceptions, Prevalence, and Patterns of Cannabis Use among Cancer Patients in NCI-Designated Cancer Centers
This study aimed to better understand the evolving landscape of cannabis use among patients undergoing cancer treatment across diverse geographic regions. It sought to characterize patterns of use, patient perceptions, and contextual influences such as legal status and provider engagement that affect cannabis use. Given the shifting legal environment, this study addresses a critical gap in the literature by providing data on how cancer patients integrate cannabis into their care. The available delivery methods of cannabis have also undergone dramatic changes and now include edibles, oils, tinctures, topicals, and inhaled forms. State-based policy changes are rapidly changing. Yet research on cannabis use by cancer patients across a variety of geographic settings remains limited. The extent of cannabis use, the perceived and real benefits and risks of use, potential interactions with cancer treatment and other medications, and impact on comorbid conditions are uncertain. Clinicians should be aware of the extent of use in order to assess potential drug-drug interactions, side effects, and contraindications; hence, an understanding of how cancer patients and clinicians engage in discussions about cannabis use is essential. Twelve NCI-designated cancer centers were awarded supplemental funding to conduct surveys assessing cannabis use among recently diagnosed cancer patients. Selected sites were in 8 states with varied legal status for medical and recreational cannabis use at the time of the survey. The 12 cancer centers independently received approval from their institutional review boards (IRB) and collected data from September 2021 to August 2023. Eligible participants were cancer patients or survivors who resided in their respective cancer center's catchment area. Cancer centers were responsible for sampling patients within their catchment area with the goal of 1,000 completed surveys. Ten out of 12 cancer centers drew probability samples of patients from patient lists defining their catchment area with some stratifying the sample in various ways including by cancer type, by sex, by race/ethnicity or by a combination of demographic variables.
Study
phs004046
Angiopredict: predicting response for bevacizumab treatment
In ANGIOPREDICT, academic cancer biologists and industry-based biotechnology researchers worked with clinicians to identify biomarkers to predict whether individual metastatic colorectal cancer patients will respond positively to bevacizumab containing therapy. In this retrospective study, copy number profiles were determined using whole genome shallow sequencing, and progression free survival and treatment data was collected.
Study
EGAS00001002724
Population Architecture using Genomics and Epidemiology (PAGE): Causal Variants Across the Life Course (CALiCo): Coronary Artery Risk Development in Young Adults (CARDIA)
CALiCo CARDIA The Coronary Artery Risk Development in Young Adults (CARDIA) Study is a study examining how heart disease develops in adults. It began in 1986 with a group of 5115 black and white men and women aged 18-30 years. The participants were selected so that there would be approximately the same number of people in subgroups of race, gender, education (high school or less and more than high school) and age (18-24 and 25-30) in Birmingham, AL; Chicago, IL; Minneapolis, MN; and Oakland, CA. These same participants were asked to participate in follow-up examinations during 1987-1988 (Year 2), 1990-1991 (Year 5), 1992-1993 (Year 7), 1995-1996 (Year 10), 2000-2001 (Year 15), and 2005-2006 (Year 20). A majority of the group has been examined at each of the follow-up examinations (90%, 86%, 81%, 79%, 74%, and 72%, respectively). While the specifics of each examination has differed somewhat, data have been collected on a variety of factors believed to be related to heart disease. These include conditions with clear links to heart disease such as blood pressure, cholesterol and other lipids. Data have also been collected on physical measurements such as weight and skinfold fat as well as lifestyle factors such as substance use (tobacco and alcohol), dietary and exercise patterns, behavioral and psychological variables, medical and family history, and other chemistries (e.g., insulin and glucose). In addition, subclinical atherosclerosis was measured via echocardiography during Years 5 and 10, computed tomography during Years 15 and 20, and carotid ultrasound during Year 20. A detailed description of the study and results from the first examination are summarized in Cutter et al (Controlled Clinical Trials, Volume 12, Number 1 [supplement], pages 1S-77S, 1991).
Study
phs000236
Framingham Heart Study-Cohort (FHS-Cohort) - Imaging
The Framingham Heart Study (FHS) is a population-based, observational cohort study initiated in 1948 to prospectively investigate the determinants of cardiovascular disease to guide public health prevention. The FHS began by recruiting an Original Cohort of 5,209 men and women between the ages of 30 and 62 from the town of Framingham, Massachusetts, who had not yet developed overt symptoms of cardiovascular disease or suffered a heart attack or stroke. The Original cohort Exam 1 took place between 1948 and 1953. Since that time the cohort has had a total of 32 biennial exams (ending in 2014) and event follow-up through 2022. In 1971, the FHS added the Offspring cohort, comprising 5124 children whose parents were enrolled in the Original cohort and the spouses of the children. This cohort on average has been examined every three to four years. However, there was an eight year gap between Exam 1 and Exam 2 and a seven year gap between Exam 7 and Exam 8. The latest exam (10) was completed in 2022. In 2002, the transgenerational FHS design was facilitated with the recruitment of 4095 children of the Offspring cohort (Third Generation), and 103 spouses of the Offspring who were not previously enrolled in the study (New Offspring Spouses, NOS). These cohorts have completed 3 exams through 2019. To reflect the changing demographic characteristics of the greater Framingham community, the FHS additionally recruited and enrolled 2 cohorts comprising racial and ethnic minority groups, termed Omni-1 and Omni-2, (n = 506 and 410, respectively) in 1995 and 2002, respectively. These cohorts included individuals of African American, Hispanic, Asian, Indian, Native American, and Pacific Islander descent. The OMNI-1 cohort have completed 5 exams through 2022. The OMNI-2 cohort have completed 3 exams through 2019. Data available for request include Echocardiogram images, available from the following exams in each cohort. Original Cohort: exams 18-32; Offspring cohort: exams 3-10; Third Generation, NOS and OMNI-2 cohorts: exams 1-3; OMNI-1 cohort: exams 1-5.CT image data at one or two timepoints were added for 4427 participants in the offspring, third generation, and OMNI1 and OMNI2 cohorts.Summary level phenotypes for the Framingham Cohort study participants can be viewed at the top-level study page phs000007 Framingham Cohort. Individual level phenotype data and molecular data for all Framingham top-level study and substudies are available by requesting Authorized Access to the Framingham Cohort study phs000007.
Study
phs003593
Exome trios in patients with gastroschisis (2019-04-08)
Gastroschisis (MIM 230750) is a herniation of the intestines through a defect of the abdominal wall lateral to the umbilicus (usually on the right side), and it is not covered by a membrane [Ledbetter, 2012]. Gastroschisis is a congenital anomaly with increasing incidence, easy prenatal diagnosis and extremely variable postnatal outcomes. On the basis of clinical manifestations, epidemiologic charateristics, and the presence and type of additional malformations, gastroschisis could be considered a heterogeneous condition with no gene/s discovered yet.
This congenital anomaly affects approximately 1-3 infancts per 10,000 live births [Calzolari et al.1995;Parker et al.,2010] Current knowledge about causative mutations/variants. To date, no single gene has been linked to gastroschisis. Some publications have tried to link this malformation to variants in genes (such as AEBP1 (adipocyte enhancer binding protein) gene [Feldkamp et al,. 2012] or the VEGF-NOS3 pathway [Lammer et al., 2008].
Previously, a Scribble mutant mouse model (circletail) was reported to exhibit gastroschisis, however recent studies demonstrated that the Scribble knockout fetus exhibits exomphalos phenotype of gastroschisis [Carnagham et al., 2013].
This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ .
This dataset contains all the data available for this study on 2019-04-08.
Dataset
EGAD00001004942
Center Common Disease Genomics [CCDG] - CVD - TAICHI
The TaiChi consortium consists of 6 studies that collaborated initially in a large scale metabochip study, and became an ongoing consortium for studies of cardiometabolic disease in the Chinese population in Taiwan. The six studies included the following: 1) SAPPHIRe (Stanford-Asian Pacific Program in Hypertension and Insulin Resistance), a family based study established in 1995 with an initial goal of identifying major genetic loci underlying hypertension and insulin resistance in East Asian populations, with Taiwan subjects participating in the TaiChi consortium; 2) TCAGEN (Taiwan Coronary Artery Disease GENetic), a cohort study that the enrolled patients undergoing coronary angiography or percutaneous intervention at the National Taiwan University Hospital (NTUH) in the setting of either stable angina pectoris or prior myocardial infarction; 3) TACT (TAiwan Coronary and Transcatheter intervention), a cohort study enrolled patients with angina pectoris and objective documentation of myocardial ischemia who underwent diagnostic coronary angiography and/or revascularization any time after October 2000 at the National Taiwan University Hospital (NTUH) (similar to TCAGEN but recruitment was independent of TCAGEN); 4) Taiwan DRAGON (Taiwan Diabetes and RelAted Genetic COmplicatioN), a cohort study of Type 2 diabetes at Taichung Veterans General Hospital (Taichung VGH) in Taiwan, with participants including individuals with either newly diagnosed or established diabetes (subjects with hyperglycemia who did not meet diagnostic criteria for Type 2 DM were not included); 5) TCAD (Taichung CAD study), includes patients with a variety of cardiovascular diseases who received care at the Taichung Veterans General Hospital (Taichung VGH), i.e. specifically individuals who were hospitalized for diagnostic and interventional coronary angiography examinations and treatment; 6) TUDR (Taiwan US Diabetic Retinopathy) enrolled subjects with Type 2 diabetes who received care at Taichung Veteran General Hospital (Taichung VGH), and a small number of subjects from Taipei Tri-Service General Hospital (TSGH); TUDR subjects underwent a complete ophthalmic and fundus examination to carefully document the presence and extent of retinopathy. From these 6 studies, subjects were selected based on completeness of standard metabolic phenotyping and knowledge of cardiac disease status (early onset coronary disease), to undergo whole genome sequencing at the Broad.
Study
phs001487
Population Architecture using Genomics and Epidemiology (PAGE): Causal Variants Across the Life Course (CALiCo): Strong Heart Study (SHS) and Strong Heart Family Study (SHFS)
The SHS is a study of cardiovascular disease and its risk factors among American Indian men and women. Using standardized methodology, it was designed to estimate cardiovascular disease mortality and morbidity and the prevalence of known and suspected cardiovascular disease risk factors and to assess the significance of these risk factors in a longitudinal analysis. The study included 13 American Indian tribes and communities in three geographic areas: an area near Phoenix, Arizona, the southwestern area of Oklahoma, and western and central North and South Dakota. The SHS included three components: The first was a survey to determine cardiovascular disease mortality rates from 1984 to 1994 among tribal members aged 35-74 years of age residing in the 3 study areas (the community mortality study). The second was the clinical examination of 4,500 tribal members aged 45-74. The SHS has completed three clinical examinations of the original Cohort (Phase I: 1989-1991; Phase II: 1993-1995; Phase III: 1998- 1999, respectively). The third component is the morbidity and mortality (M&M) surveillance of these 4,500 participants. Yearly SHS surveillance has only 0.2% loss to follow-up. All deaths and all nonfatal CVD events are classified by standardized criteria, including details of stroke and HF. The Strong Heart Family Study (SHFS) is a genetic epidemiological study designed to investigate the heritability of CVD and its risk factors and to localize genes that contribute to CVD risk in American Indians. SHFS participants include 3,838 family members that were >/=15 years old, and ascertained through sibships of the original SHS, from 94 extended (large, multigenerational) families. Exams have occurred in a pilot Phase III (1998-1999, 900 SHFS participants), in Phase IV (2001-2003), and Phase V (2006-2009). SHFS morbidity and mortality surveillance has occurred throughout the study phases, with 0.3% lost to follow-up. Genetic data includes complete pedigree information, DNA samples from all family members, a 10cM-spaced microsatellite map used for IBD estimation and to perform linkage analysis, genotypes for more than 12,000 SNPs in candidate regions, and genotypes from commercially available SNP assays.
Study
phs000580
Predicting brain tumor recurrence: development and validation of a DNA-methylation based nomogram in meningioma
Our work is the first to demonstrate the transformative utility of integrating clinical and molecular factors for use beyond simple classification into the realm of individualized prognostication for any brain tumor. Using our developed and validated tools that are publicly available, clinicians will be able combine clinical and molecular factors to determine an individualized probability of recurrence for patients with meningiomas. This represents a major advance in the field of personalized medicine for neuro-oncology, and the use of this tool can help clinicians overcome one of the most challenging limitations we face when treating patients with meningiomas.
Study
EGAS00001003490
A Clone's Genomic Stability as a Biomarker of Its DNA-Damage Resilience
A main problem in the treatment of advanced cancers, including gastric cancers and glioblastoma, is the incertitude at which we predict how individual patients will respond to DNA-damaging agents, especially on the long run. Knowing the mechanism behind a patient's response, or the lack thereof, will help us depart from the oversimplified “more-is-better” and “one-size-fits-all” principles according to which DNA-damaging agents are administered. This will improve clinical outcome by allowing us to pinpoint those who would respond better and longer to lower doses than to higher doses of DNA-damaging agents. Under the assumption that the success of DNA-damaging therapy increases with the proliferation rate of a relatively homogeneous tumor population, there was little reason to assume anything other than monotonic dose-response relations. With the recent paradigm shift that most cancers are in fact DNA mosaic products of ongoing evolution, comes the urgency to reconsider these fundamental principles behind DNA-damaging therapy administration. As the developers of one of the first DNA deconvolution methods and with access to technologies to profile the transcriptomes of up to 10,000 cells simultaneously, we are equipped to embark on first personalized dose-finding strategies for DNA-damaging therapies. We will test the potential of the very long-term legacy that DNA-damage entails on a cell “genomic instability” as new biomarker of DNA-damage response. Our preliminary studies showed that, for most cancer types, DNA-damaging agents change a clone's genomic instability and that clones succumb to a limit in the amount of genomic instability they can tolerate. In particular, our results showed that patients with intermediate genomic instability have a very poor outcome and that this relation is only evident among treatment-naive patients, but not among patients treated with DNA-damaging agents. Further they show that we can measure genomic instability per clone and that clones with extreme genomic instability typically don't grow large. Our hypothesis that genomic instability, rather than proliferation rate, determines how sensitive a tumor is to DNA-damaging agents on the long-term, is founded on two unexpected findings. Patients with extremely high genomic instability per tumor clone have an exceptionally good outcome. Aim 1 will integrate exome- and single cell RNA-Seq (scRNA-Seq) data to characterize clones and to measure how much genomic instability they can tolerate. Low genomic instability is associated with reduced benefit from DNA-damaging agents. Aim 2 will use comet assays and treatment history to quantify DNA damage per clone, relating it to the clones' ability to tolerate DNA damage and to changes in the genomic instability of therapy-surviving clones.
Study
phs003762
ANGIOPREDICT - an FP7-funded project enabling personalised medicine for patients with metastatic colorectal cancer.
In ANGIOPREDICT, academic cancer biologists and industry-based biotechnology researchers will work together with clinicians to identify biomarkers to predict whether individual metastatic colorectal cancer patients will respond positively to Avastin® combination therapy. Diagnostic tests using these biomarkers will also be developed to provide clinicians with the means to predict patient treatment responses in the future.
Study
EGAS00001005423
Resource for Genetic Epidemiology Research on Adult Health and Aging (GERA)
The Resource for Genetic Epidemiology Research on Aging (GERA) Cohort was created by a RC2 "Grand Opportunity" grant that was awarded to the Kaiser Permanente Research Program on Genes, Environment, and Health (RPGEH) and the UCSF Institute for Human Genetics (AG036607; Schaefer/Risch, PIs). The RC2 project enabled genome-wide SNP genotyping (GWAS) to be conducted on a cohort of over 100,000 adults who are members of the Kaiser Permanente Medical Care Plan, Northern California Region (KPNC), and participating in its RPGEH. The purpose of the RPGEH is to facilitate research on the genetic and environmental factors that affect health and disease by linking together clinical data from electronic health records, survey data on demographic and behavioral factors, and environmental data from various sources, with genetic data derived from biospecimens collected from participants. At the time of the award of the RC2 project in late 2009, the RPGEH had established a cohort of about 140,000 individuals who had answered a detailed survey, provided saliva samples for extraction of DNA, and given broad consent for the use of their data in studies of health and disease. To maximize the diversity of the resulting sample, the GERA cohort was formed by including all racial and ethnic minority participants with saliva samples (N = 20,925; 19%); the remaining participants were drawn sequentially and randomly from white non-Hispanic participants (89,341; 81%). A total of 110,266 participant samples were included to ensure that at least 100,000 were successfully assayed. The resulting GERA cohort is 42% male, 58% female, and ranges in age from 18 to over 100 years old with an average age of 63 years at the time of the RPGEH survey (2007). The sample is ethnically diverse, generally well-educated with above average income. Approximately 69% of the participants are married or living with a partner. Length of membership in KPNC averages 23.5 years. UCSF and RPGEH investigators worked with the genomics company Affymetrix to design four custom microarrays for genotyping each of the four major race-ethnicity groups included in the GERA Cohort, described in detail in Hoffmann et al., 2011a and 2011b. Following genotyping and quality control procedures, and after removal of invalid, discordant, or withdrawn samples, about 103,000 participants were successfully genotyped. The resulting genotypic data were linked to survey data and data abstracted from the electronic medical records. As described below, all RPGEH participants were mailed new consent forms with explicit discussion of the placement of data in the NIH-maintained dbGaP. About 77% of participants returned completed consent forms, resulting in a final sample size of 78,486 participants in the GERA Cohort with data for deposit into dbGaP. Origins of the RPGEH GERA Cohort The goal in creating the RPGEH GERA cohort was to create a large, multiethnic, and comprehensive population-based resource for research into the genetic and environmental basis of common age-related diseases and their treatment, and factors influencing healthy aging and longevity. The GERA Cohort consists of a diverse cohort of more than 100,000 adults who are members of the Kaiser Permanente Medical Care Plan, Northern California Region (KPNC), and participating in its Research Program on Genes, Environment and Health (RPGEH). KPNC is an integrated health care delivery system with a population of about 3.3 million people in northern California. The membership of KPNC is representative of the general population in the 14 county area in which facilities are located, although the membership is underrepresented for the extremes of income at both ends of the spectrum. The RPGEH utilizes the longitudinal electronic health records (EHR) of KPNC to obtain clinical, laboratory, imaging and pharmacy information on all cohort members, to which personal demographic, behavioral and health characteristics have been added through member surveys. The GERA Cohort comprises a subsample of the RPGEH participant cohort, and was created through the RC2 award from the NIA, NIMH, and NIH Common Fund as described above. GERA Study Design The GERA Cohort is a subsample, as described above, of the longitudinal cohort enrolled in the Kaiser Permanente RPGEH. The RPGEH cohort includes about 400,000 survey participants of whom about 200,000 have provided broad consent and a sample of saliva or blood for use in studies of genetic and environmental factors in health and disease. The GERA Cohort was developed from a mailed survey sent to all adult members of KPNC who had been members for two years or more in 2007. All survey respondents were contacted and asked to complete a consent form; those who completed consent forms were asked to provide a saliva sample. Additional male participants were added to the RPGEH through inclusion of the Northern California sample of the California Men's Health Study (CMHS) cohort of about 40,000 men from KPNC, ages 45-69 years old at the time of the CMHS survey in 2002-2003. The CMHS participants contributed about 15,400 saliva samples to the RPGEH and were eligible for inclusion in the GERA Cohort. CMHS participants were included according to the same sampling design as for the RPGEH cohort as a whole. Specifically, all minority participants were selected for inclusion in order to maximize representation of minorities in the GERA Cohort, and Non-Hispanic White participants were selected at random to complete the sample of 110,266 GERA Cohort participants. GERA Genotypic Data High-density genotyping was conducted at UCSF using custom designed Affymetrix Axiom arrays, as described in Hoffmann et al. (2011a; 2011b). To maximize genome-wide coverage of common and less common variants, four specific arrays were designed for individuals of Non-Hispanic White (EUR), East Asian (EAS), African-American (AFR), and Latino (LAT) race/ethnicity. There was broad overlap among the SNPs on the arrays, which were designed using a hybrid greedy imputation algorithm (Hoffmann et al., 2011b) applied to genotype information validated by Affymetrix from the 1000 Genomes Project. However, in order to capture low frequency variants specific to particular race-ethnicity groups, SNP content varies between arrays. A more detailed description of the process of genotyping and results is included in Genotyping of DNA Samples. Description of the analyses of population structure and development of principal components for adjustment of population structure is included in Population Structure Analysis. GERA Phenotypic Data RPGEH and CMHS Survey Data. The sources of data on demographic and behavioral factors deposited in dbGaP for the GERA Cohort are the RPGEH and CMHS surveys. Data on common demographic factors such as gender, race/ethnicity, marital status, and education and on behavioral factors such as smoking, alcohol consumption, and body mass index, have been cleaned, edited, reconciled between the two surveys, and compiled into summary indices, where appropriate, for deposition into dbGaP. A more complete description of the survey variables is included in Survey Variables Documentation. Please note that the terms of use of the GERA Cohort Data, as specified in the Data Use Certification (DUC), prohibit the use of survey variables as outcomes in analyses. For example, a genome-wide association study (GWAS) of education or smoking is not permitted as specified by the DUC. Only health conditions can be used as outcome variables in analyses. Health Conditions derived from Kaiser Permanente Electronic Medical Records. Data on the occurrence of health conditions in participants in the GERA Cohort have been derived from summarizing ICD-9 coded diagnoses in Kaiser Permanente's electronic medical records. An algorithm that aggregates specific ICD-9 codes into appropriate diagnostic groups for selected conditions is applied to outpatient and inpatient databases; see Disease and Conditions Definitions Documentation for details. The criterion for including a condition as "present" for a participant is the occurrence of two or more diagnoses within a diagnostic category occurring on separate days. Two or more is used as the criterion in order to reduce false positives due to mistakes or rule-out diagnoses. When compared with validated disease registries, the criterion of 2+ diagnoses yields high specificity and good sensitivity. ICD-9 codes in the electronic records are specified in several ways. For outpatient visits occurring during the period 1995 to 2006, diagnoses were assigned by the treating physician who endorsed specific diagnoses on an optically scanned list that varied by specialty. Beginning in 2006 with the advent of an integrated, fully electronic medical record, outpatient diagnoses are made by physicians/ providers using a pull down menu. Discharge diagnoses from inpatient stays are specified by physicians and coded by specially trained coders. Databases of ICD-9 codes for diagnoses assigned at outpatient visits, or as one of the discharge diagnoses following inpatient stays, are complete and available for all KPNC members dating back to 1995. Although the average length of KPNC membership among GERA cohort members is 23.5 years in 2007, not all have been members since 1995, so the history for some conditions, such as those that are not chronic or recurrent, may not be complete for all cohort members. The year of first membership in KPNC is included as a variable in the list of survey variables, enabling investigators to estimate the number of years of observation of each Cohort member. RPGEH Access and Collaborations Website and Procedures The RPGEH maintains a web portal for inquiries and applications for collaboration and access to data. The url is: https://rpgehportal.kaiser.org/. RPGEH has an application process and an Access Review Committee that reviews applications for collaboration and use. For more details, please contact RPGEH through the website.
Study
phs000674
Coronary Artery Risk Development in Young Adults (CARDIA) Study - Cohort
CARDIA is a study examining the etiology and natural history of cardiovascular disease beginning in young adulthood. In 1985-1986, a cohort of 5115 healthy black and white men and women aged 18-30 years were selected to have approximately the same number of people in subgroups of age (18-24 and 25-30), sex, race, and education (high school or less and more than high school) within each of four US Field Centers. These same participants were asked to participate in follow-up examinations during 1987-1988 (Year 2), 1990-1991 (Year 5), 1992-1993 (Year 7), 1995-1996 (Year 10), 2000-2001 (Year 15), 2005-2006 (Year 20), and 2010-2011 (Year 25); the proportions of the surviving cohort that have returned for the seven follow-up examinations were 90%, 86%, 81%, 79%, 74%, 72%, and 72%, respectively. In addition to the follow-up examinations, participants are contacted regularly for the ascertainment of information on out-patient procedures and hospitalizations experienced between contacts. Within the past five years, 95% of the original surviving cohort has been contacted. While the specifics of each examination has differed somewhat, data have been collected on a variety of factors believed to be related to heart disease. These include conditions with clear links to heart disease such as blood pressure, cholesterol and other lipids. Data have also been collected on physical measurements such as weight and skinfold fat as well as lifestyle factors such as substance use (tobacco and alcohol), dietary and exercise patterns, behavioral and psychological variables, medical and family history, and other chemistries (e.g., insulin and glucose). In addition, subclinical atherosclerosis was measured via echocardiography during Years 5, 10, and 25, computed tomography during Years 15 and 20, and carotid ultrasound during Year 20. The CARDIA Cohort is utilized in the following dbGaP sub-studies. To view genotypes, other molecular data, and derived variables collected in these sub-studies, please click on the following sub-studies below or in the "Sub-studies" box located on the right hand side of this top-level study page phs000285 CARDIA Cohort. phs000236 PAGE_CALiCo_CARDIA phs000309 GENEVA_CARDIA phs000399 GO-ESP HeartGO_CARDIA phs000613 CARDIA_CARe
Study
phs000285
Hydroxyurea to Prevent Organ Damage in Children with Sickle Cell Anemia (BABY HUG) Phase III Clinical Trial and Follow-Up Observational Studies I and II
Sickle cell anemia is associated with substantial morbidity from acute complications and organ dysfunction beginning in the first year of life. In 1995, the Multicenter Study of Hydroxyurea (MSH) (dbGaP phs002348) demonstrated that, in adults, hydroxyurea is effective in decreasing the frequency of painful crises, hospitalizations for crises, acute chest syndrome, and blood transfusions by 50%. The phase I/II study of hydroxyurea in children (HUG KIDS) demonstrated that children have a response to hydroxyurea similar to that seen in adults in terms of increasing fetal hemoglobin levels and total hemoglobin, and decreasing complications associated with sickle cell anemia. In addition, this study demonstrated that the drug does not adversely affect growth and development between the ages of 5 and 15. A pilot study of hydroxyurea (HUSOFT) given to children between the ages of 6 months and 24 months demonstrated that the drug was well tolerated and that the fetal hemoglobin levels rose and remained elevated compared to baseline with continued hydroxyurea administration.
A Special Emphasis Panel (SEP) met on April 12, 1996 to review the results of the MSH trial and the progress to date of the HUG KIDS study. The SEP recommended that NHLBI undertake the BABY HUG trial.
The BABY HUG Randomized Controlled Trial concluded that hydroxyurea treatment in very young children seemed to have an acceptable safety profile and to reduce complications of sickle cell anemia. However, more data were needed on the long-term safety of hydroxyurea use in very young children. As a result, follow-up studies were initiated. The Follow-Up Study II provided longer follow-up than Follow-Up Study I, and included more assessment types than Follow-Up Study I.The BABY HUG program consisted of three related studies, each of which has associated datasets and bio-specimens.
A randomized controlled trial comparing hydroxyurea to placebo in very young children with sickle cell anemia (BABY HUG Randomized Controlled Trial)
The first observational follow-up study of children from the randomized controlled trial (BABY HUG Follow-Up Study I). All children in Follow-Up Study I were offered the option of taking open-label hydroxyurea, with treatment decisions made by the family and the clinical team caring for the child.
The second observational follow-up study of children from BABY HUG Follow-Up Study I. All children in Follow-Up Study II were offered the option of taking open-label hydroxyurea, with treatment decisions made by the family and the clinical team caring for the child.The purpose of the Randomized Controlled Trial was to determine if hydroxyurea can safely prevent early end organ damage in very young children with sickle cell anemia.
The purpose of the BABY HUG Follow-up Study I was to provide structured follow-up of the children enrolled in the BABY HUG Randomized Controlled Trial, in order to characterize the long-term toxicities and unexpected risks (if any) associated with treatment with hydroxyurea at an early age.
The objective of Follow-Up Study II was to obtain additional data about the long-term safety and efficacy of hydroxyurea use in children with Sickle Cell Anemia through at least the first decade of life.Instructions for requesting individual-level data are available on BioData Catalyst at https://biodatacatalyst.nhlbi.nih.gov/resources/data/. Apply for data access in dbGaP. Upon approval, users may begin accessing requested data in BioData Catalyst. For questions about availability, you may contact the BioData Catalyst team at https://biodatacatalyst.nhlbi.nih.gov/contact.
Study
phs002415
Whole genome study of Hurthle cell thyroid carcinoma
Oncocytic thyroid carcinoma, also known as Hürthle cell thyroid carcinoma, accounts for only a small percentage of all thyroid cancers. However, this malignancy often presents at an advanced stage and poses unique challenges to patients and clinicians. Surgical resection of the tumor accompanied in some cases by radioactive iodine treatment, radiation and chemotherapy are the established modes of therapy. Knowledge of the perturbed oncogenic pathways can provide better understanding of the mechanism of disease and thus opportunities for more effective clinical management. Initially, two oncocytic thyroid carcinomas and their matched normal tissues were profiled using whole genome sequencing. Subsequently, 72 oncocytic thyroid carcinomas, one cell line and 5 Hürthle cell adenomas were examined by targeted sequencing for the presence of mutations in multiple endocrine neoplasia I (MEN1) gene. We report the identification of MEN1 loss-of-function mutations in approximately 7% of patients diagnosed with oncocytic thyroid carcinoma. Whole genome sequence data also revealed large regions of copy number variation encompassing nearly the entire genomes of these tumours. Menin, a ubiquitously expressed nuclear protein, is a well-characterized tumor suppressor whose loss is the cause of multiple endocrine neoplasia type 1 syndrome. Menin is involved in several major cellular pathways such as regulation of transcription, control of cell cycle, apoptosis and DNA damage repair pathways. Mutations of this gene in a subset of Hürthle cell tumors point to a potential role for this protein and its associated pathways in thyroid tumorigenesis.
Study
EGAS00001000940
Genome-Wide Association Study of HCC in Non-Asian USA Population
This GWAS of HCC relied on existing biological and data resources from 7 USA sites and one Canadian site which allows for the integration of genetic and environmental data. Total of 2199 case patients and 1103 non cancer controls were genotyped. Case patients are defined as patients with pathological or radiological diagnosis of HCC with and without chronic hepatitis C virus infection. Study was restricted to Caucasian population without prior infection with chronic HBV infection. All DNA samples were extracted from peripheral blood. Study population of the participated sites are: Site-1: University of Texas MD Anderson Cancer Center A hospital based-case-control study was initiated at UT MD Anderson Cancer Center in 2000 and approved by the Institutional Review Board of The University of Texas M. D. Anderson Cancer Center. Study design was previously described in detailed (1-3). Written informed consent for an interview and for a biological sample was obtained from each participant. At time of diagnosis prior to treatment exposure. Case patients were recruited from the population of patients with newly diagnosed HCC who were evaluated and treated at the institution's gastrointestinal medical and surgical oncology outpatient clinics. Inclusion criteria were as follows: a pathologically confirmed diagnosis of HCC, U.S. residency, and the ability to communicate in English. Patients with concurrent or past history of other types of cancers were excluded. Control subjects were healthy spouses of patients with cancers other than liver, gastrointestinal, lung, or head and neck (smoking-related cancers) undergoing treatment at our institution. Eligibility criteria for control subjects were the same as those for patients, except that control subjects could not have ever had cancer. A short, structured questionnaire was used to screen potential control subjects on the basis of eligibility criteria. Control subjects and patients were recruited simultaneously. Case patients and control subjects were interviewed by well-trained interviewers who followed a written protocol to guide ascertainment and reduce surveillance, interviewer, and recall bias. No proxy interviews were conducted. The interviewers used a structured and validated questionnaire to collect information about demographic characteristics and potential risk factors for HCC such as personal smoking history, alcohol use, medical history, occupational history, and family history of cancer. Blood samples from cases and controls were tested for HBV and HCV. HCV antibodies, hepatitis B surface antigen, and antibodies to hepatitis B core antigen were detected by use of a third-generation enzyme-linked immunosorbent assay (ELISA) (Abbott Laboratories, North Chicago, IL). Important clinical information were retrieved from patients' medical records. Total of 1188 case patients and 278 controls from MD Anderson study were genotyped. Site-2: Mayo Clinic Case patients and controls included in the study from Mayo clinic were recruited as part of Mayo International Hepatobiliary Neoplasia Register and Biorepository. Only USA patients and controls were participated. All cases and controls signed informed consent indicating their willingness to participate in genetic studies. Epidemiological and clinical data were collected from participants and retrieved from medical records. Total of 522 HCC case patients and 182 controls were genotyped in this study. Site-3: Toronto University Molecular Epidemiology of Hepatobiliary Tumors (HBT study, CAPCR 09-0289) is a hospital-based study that includes 1710 patients diagnosed with liver, bile duct or gallbladder adenocarcinoma (Hepatocellular carcinoma - HCC; cholangiocarcinoma - CCA; and gall bladder adenocarcinoma -GBCa) recruited at University Health Network (UHN) between 2009 and 2018. HBT study is a prospective study of cancer patients that collects self-reported demographic and epidemiological data, medical records/clinical information and blood samples for research purposes. Eligible patients recruited between 2009 and March 15 2016 were selected to participate in the Genome-Wide Association Study in Patients with Hepatocellular Carcinoma (GWAS in HCC, CAPCR 15-9943-CE), a sub-study of the HBT study. Eligible prospective patients recruited after March 15 2016 signed the Addendum to the HBT consent form and became eligible to participate in the GWAS in HCC study. Total of 271 cases and 21 controls were genotyped from Toronto University Site-4: University of Pittsburgh The design of the study has been described previously (4-6). Briefly, this population-based study enrolled 120 HCC patients and 230 matched controls from black, Hispanic, and non-Hispanic white residents of Los Angeles County who were between 18 and 74 years of age at diagnosis from January 1995 through December 2001. HCC Cases were identified through the Los Angeles County Cancer Surveillance Program. We sought to recruit up to two control subjects per case from the neighborhoods where HCC patients resided at the time of diagnosis, who were matched to the index case by sex, age (within 5 years), and race (Hispanic white, non-Hispanic white, black). Blood samples (plasma and buffy coat), medical and lifestyle factors were collected from all consenting participants. Total of 65 cases and 60 controls were genotyped in this study Site-5: South Western Dallas Under IRB approval, HCC Cases and controls were prospectively collected since 2015 from the outpatient Liver clinics from Parkland Health and Hospital System and UT Southwestern Medical Center, two large health systems in Dallas TX. A Total of 31 cases and 29 controls were genotyped from site 5. Site-6: Columbia University HCC patients from Columbia are those recruited as part of the Herbert Irving Comprehensive Cancer Center Database Shared Resource which seeks to recruit all cancer patients for potential future studies. The PI is Dr. Katherine Crew AAAL5871. This resource collects sociodemographic, lifestyle and clinical data on patients as well as a blood sample. Those who indicated on their consent form that they would participate in genetic studies were included. Total of 79 case patients with HCC were genotyped from Columbia University. Site-7: University of Michigan The study population included from University of Michigan were included from a prospective study of with chronic HCV infection recruited in Ann Arbor Patients who had under- gone liver transplantation, known coinfection with HIV, life expectancy <12 months due to extra-hepatic illnesses, or were receiving HCV treatment at enrolment, were excluded. Protocol, surveys, and data forms were developed where each enrolled patient completed the same questionnaire All patients provided written informed consent before enrolment in the study. The study was approved by the institutional review board or ethics committee at the University of Michigan. Detailed study description was previously published (7). Total of 44 cases and 347 controls were genotyped Site-8: Veteran Administration Medical Center in Houston This research including written informed consent form was jointly approved by Institutional Review Boards for the Baylor College of Medicine and the Michael E. DeBakey VA Medical Center in Houston, Texas. Study details have been previously published (8). Briefly, we prospectively recruited consecutive HCV-infected veterans prior to their previously scheduled HCV clinic visit at a large tertiary care VA medical center between May 1, 2009 and December 31, 2012. Patients completed a research assistant (RA) administered survey interrogating medical and risk factor history including lifetime alcohol use, had anthropometric measurements taken, and completed a fasting venipuncture for performance of the FibroSure-ActiTest as a measure of hepatic pathology. We restricted our current analysis to individuals who were: (1) White male veterans between 18 and 70 years; (2) had no history of HCC, liver transplant, decompensated liver disease including ascites, dementia, or psychosis; (3) were serologically-confirmed to have chronic HCV and to be negative for both HIV and active HBV infection; (4) were not currently receiving anti-HCV pharmacotherapy; and (5) had FibroSURE testing shown F3/F4 fibrosis consistent with cirrhosis.
Study
phs001744
Genetics of Inherited Muscle Disease
The samples are drawn from a collection of patients with a heterogeneous set of neuromuscular disorders, including congenital muscular dystrophy, congenital myopathy, limb-girdle muscular dystrophy, Emery-Dreifuss muscular dystrophy, and arthrogryposis, along with unaffected parents and siblings in some cases. The samples were collected by the following clinicians affiliated with the associated institutes: Kathryn North and Nigel Clarke (Institute for Neuroscience and Muscle Research, Children's Hospital at Westmead, Australia) Hanns Lochmuller and Kate Bushby (The Newcastle Muscle Centre, Newcastle University, UK) Peter Kang (Boston Children's Hospital) Carsten Bonnemann (National Institutes of Health, Bethesda, MD, USA) Nigel Laing (University of Western Australia) All exome sequencing was performed at the Broad Institute of Harvard and MIT; samples sequence capture was performed using Agilent SureSelect Human All Exon Kit v2 or Illumina's Rapid Capture Exome enrichment kit and sequencing was performed on an Illumina HiSeq 2000. In addition some samples were whole genome sequenced on Illumina HiSeq X Ten.
Study
phs000655
Pan Prostate Cancer Group data
The Pan Prostate Cancer Group intends to provide breakthrough advances from the analysis of a very large series of Whole Genome DNA data, contributed by many of the leading scientists and clinicians working in prostate cancer genomics generating Whole Genome DNA Sequence (WGS) data from prostate cancer patients. The accumulated data, collected and compared in a common format including common storage, re-analysis through a single pipeline, and investigation to achieve a variety of scientific goals, are presented in this series.
This project will provide a global framework for the future analysis of even larger series of genomes as costs of DNA sequencing, storage, and sequence analysis decrease.
In addition to these WGS data from different clinical categories, ethnicities, levels of aggressiveness, and disease managed by different treatments with information linked to long-term clinical follow up, we intend to add additional layers of data, including transcriptome and methylation data.
The current data series focuses primarily on the analysis of WGS data.
For further information, please refer to https://panprostate.org/, and The Pan Prostate Cancer Group Data Sharing Policy, EGAP50000000784.
Study
EGAS00001002876
Uncovering the Genetic Architecture of Colorectal Cancer with Focus of Rare and Less Frequent Variants
The Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) is a collaborative effort comprised of a coordinating center and scientific researchers from well-characterized cohort and case-control studies conducted in North America and Europe. This international consortium aims to accelerate the discovery of common and rare genetic risk variants for colorectal cancer by conducting large-scale meta-analyses of existing and newly generated genome-wide association study (GWAS) data, replicating and fine-mapping of GWAS discoveries, and investigating how genetic risk variants are modified by environmental risk factors. To expand these efforts, we assembled case-control sets or nested case-control sets from 20 different North American or European studies. Summary descriptions and study participant inclusions/exclusion criteria for each of these studies are detailed below. The Black Women's Health Study (BWHS): Is the largest follow-up study of the health of African-American women (Cozier et al., 2004; Rosenberg et al., 1995) [PMID: 15018884; PMID: 7722208]. The purpose is to identify and evaluate causes and preventives of cancers and other serious illnesses in African-American women. Among the diseases being studied are breast cancer, colorectal cancer, type 2 diabetes, uterine fibroids, systemic lupus erythematosus, and cardiovascular disease. The study began in 1995, when 59,000 black women from all parts of the United States enrolled through postal questionnaires. The women provided demographic and health data on the 1995 baseline questionnaire, including information on weight, height, smoking, drinking, contraceptive use, use of other selected medications, illnesses, reproductive history, physical activity, diet, use of health care, and other factors. The participants are followed through biennial questionnaires to determine the occurrence of cancers and other illnesses and to update information on risk factors. Self-reports of cancer are confirmed through medical records and state cancer registry records. Mouthwash-swish samples, as a source of DNA, were obtained from ~26,000 BWHS participants in 2002-2007. DNA was isolated from the mouthwash-swish samples at the Boston University Molecular Core Genetics Laboratory using the QIAAMP DNA Mini Kit (Qiagen). All incident colorectal cancer cases with a DNA sample were included in the present analysis. Two controls per case, selected from among BWHS participants free of colorectal cancer at end of follow-up, were matched to cases on year of birth (+/- 2 years) and geographical region of residence (Northeast, South, Midwest, and West). A total 209 colorectal cancer cases and 423 controls were sent for genotyping. Campaign Against Cancer and Heart Disease (CLUE II): The Campaign Against Cancer and Heart Disease, is a prospective cohort designed to identify biomarkers and other factors associated with risk of cancer, heart disease, and other conditions (Kakourou et al., 2015) [PMID: 26220152]. 32,894 participants were recruited from May through October 1989 from Washington County, Maryland and surrounding communities. Colorectal cancer cases (n = 297) and matched controls (n = 296) were identified between 1989 and 2000 among participants in the CLUE II cohort of Washington County, Maryland. Colorectal Cancer Study of Austria (CORSA): In the ongoing colorectal cancer study of Austria (CORSA), more than 13,000 Caucasian participants have been recruited within the province-wide screening project "Burgenland Prevention Trial of Colorectal Disease with Immunological Testing" (B-PREDICT) since 2003 (Hofer et al., 2011) [PMID: 21422235]. All inhabitants of the Austrian province Burgenland aged between 40 and 80 years are annually invited to participate in fecal immunochemical testing and haemoccult positive screening participants are invited for colonoscopy. CORSA includes genomic DNA and plasma of colorectal cancer cases, low-risk and high-risk adenomas, and colonoscopy-negative controls. Controls received a complete colonoscopy and were free of colorectal cancer or polyps. CORSA participants have been recruited in the four KRAGES hospitals in Burgenland, Austria, and additionally, at the Medical University of Vienna (Department of Surgery), the Viennese hospitals "Rudolfstiftung" and the "Sozialmedizinisches Zentrum Sud", and at the Medical University of Graz (Department of Internal Medicine). 1403 colorectal cancer and advanced colorectal adenoma cases, and 1404 matched controls were selected for the study. Distribution of factors sex and age (5 year strata) were evenly matched between cases and controls. Cancer Prevention Study II (CPS II): The CPS II Nutrition cohort is a prospective study of cancer incidence and mortality in the United States, established in 1992 and described in detail elsewhere (Calle et al., 2002; Campbell et al., 2014) [PMID: 12015775; PMID: 25472679]. At enrollment, participants completed a mailed self-administered questionnaire including information on demographic, medical, diet, and lifestyle factors. Follow-up questionnaires to update exposure information and to ascertain newly diagnosed cancers were sent biennially starting in 1997. Reported cancers were verified through medical records, state cancer registry linkage, or death certificates. The Emory University Institutional Review Board approves all aspects of the CPS II Nutrition Cohort. A total of 360 cases and 359 controls were selected for this study. Czech Republic Colorectal Cancer Study (Czech Republic CCS): Cases with positive colonoscopy results for malignancy, confirmed by histology as colon or rectal carcinomas, were recruited between September 2003 and May 2012 in several oncological departments in the Czech Republic (Prague, Pilsen, Benesov, Brno, Liberec, Ples, Pribram, Usti and Labem, and Zlin). Two control groups, sampled at the same time of cases recruitment, were included in the study. The first group consisted of hospital-based individuals with a negative colonoscopy result for malignancy or idiopathic bowel diseases. The reasons for the colonoscopy were: i) positive fecal occult blood test, ii) hemorrhoids, iii) abdominal pain of unknown origin, and iv) macroscopic bleeding. The second control group consisted of healthy blood donor volunteers from a blood donor center in Prague. All individuals were subjected to standard examinations to verify the health status for blood donation and were cancer-free at the time of the sampling. Details of CRC cases and controls have been reported previously (Vymetalkova et al., 2014; Naccarati et al., 2016; Vymetalkova et al., 2016) [PMID: 24755277; PMID: 26735576; PMID: 27803053]. All subjects were informed and provided written consent to participate in the study. They approved the use of their biological samples for genetic analyses, according to the Declaration of Helsinki. The design of the study was approved by the Ethics Committee of the Institute of Experimental Medicine, Prague, Czech Republic. All subjects included in the study were Caucasians and comprised 1792 cases and 1764 matched controls. Controls were matched to CRC cases as 1:1 ratio. Matching was done on age and sex. Age was matched on +-5 years, whereas sex was matched exactly. For the cases without matched controls, matching was done only on sex. Early Detection Research Network (EDRN): The aim of the EDRN initiative is to develop and sustain a biorepository for support of translational research (Amin et al., 2010) [PMID: 21031013]. High-quality biospecimens were accrued and annotated with pertinent clinical, epidemiologic, molecular and genomic information. A user-friendly annotation tool and query tool was developed for this purpose. The various components of this annotation tool include: CDEs are developed from the College of American Pathologists (CAP) Cancer Checklists and North American Association of Central Cancer Registries (NAACR) standards. The CDEs provides semantic and syntactic interoperability of the data sets by describing them in the form of metadata or data descriptor. A total of 352 colorectal case samples and 399 controls were selected for this study. Controls were matched to CRC cases based on age and sex. The EPICOLON Consortium (EPICOLON): The EPICOLON Consortium comprises a prospective, multicentre and population-based epidemiology survey of the incidence and features of CRC in the Spanish population (Fernandez-Rozadilla et al., 2013) [PMID: 23350875]. Cases were selected as patients with de novo histologically confirmed diagnosis of colorectal adenocarcinoma. Patients with familial adenomatous polyposis, Lynch syndrome or inflammatory bowel disease-related CRC, and cases where patients or family refused to participate in the study were excluded. Hospital-based controls were recruited through the blood collection unit of each hospital, together with cases. All of the controls were confirmed to have no history of cancer or other neoplasm and no reported family history of CRC. Controls were randomly selected and matched with cases for hospital, sex and age (+- 5 years). A total of 370 cases and 370 controls were selected for genotyping. Hawaii Adenoma Study: For this adenoma study, two flexible-sigmoidoscopy screening clinics were first used to recruit participants on Oahu, Hawaii. Adenoma cases were identified either from the baseline examination at the Hawaii site of the Prostate Lung Colorectal and Ovarian cancer screening trial during 1996-2000 or at the Kaiser Permanente Hawaii's Gastroenterology Screening Clinic during 1995-2007. In addition, starting in 2002 and up to 2007, we also approached for recruitment all eligible patients who underwent a colonoscopy in the Kaiser Permanente Hawaii Gastroenterology Department. Cases were patients with histologically confirmed first-time adenoma(s) of the colorectum and were of Japanese, Caucasian or Hawaiian race/ethnicity. Controls were selected among patients with a normal colorectum and were individually matched to the cases on age at exam, sex, race/ethnicity, screening date (+-3 months) and clinic and type of examination (colonoscopy or flexible sigmoidoscopy). We recruited 1016 adenoma cases (67.8% of all eligible) and 1355 controls (69.2% of all eligible); 889 cases and 1169 controls agreed to give a blood and 29 cases and 34 controls, a mouthwash sample. A total of 989 cases and 1185 controls were genotyped for this study. Columbus-area HNPCC Study (HNPCC, OSUMC): Patients with colorectal adenocarcinoma diagnosed at six participating hospitals were eligible for this study, regardless of age at diagnosis or family history of cancer. Patients with a clinical diagnosis of familial adenomatous polyposis were not eligible for this study. These six hospitals perform the vast majority of all operations for CRC in the Columbus metropolitan area (population 1.7 million). The institutional review board at all participating hospitals approved the research protocol and consent form in accordance with assurances filed with and approved by the United States Department of Health and Human Services. Briefly, during the period of January 1999 through August 2004, 1,566 eligible patients with CRC were accrued to the study (Hampel et al., 2008) [PMID 18809606]. A total of 1472 colorectal cancer samples had enough blood DNA remaining to be sent for genotyping. Control samples were provided by the Ohio State University Medical Center%#39;s (OSUMC) Human Genetics Sample Bank. The Columbus Area Controls Sample Bank is a collection of control samples for use in human genetics research that includes both donors' anonymized biological specimens and linked phenotypic data. The data and samples are collected under the protocol "Collection and Storage of Controls for Genetics Research Studies", which is approved by the Biomedical Sciences Institutional Review Board at OSUMC. Recruitment takes place in OSUMC primary care and internal medicine clinics. If individuals agree to participate, they provide written informed consent, complete a questionnaire that includes demographic, medical and family history information, and donate a blood sample. 4-7 ml of blood is drawn into each of 3 ACD Solution A tubes and is used for genomic DNA extraction and the establishment of an EBV-transformed lymphoblastoid cell culture, cell pellet in Trizol, and plasma. Controls were matched to CRC cases as 1:1. Matching was done on age at reference time (age_ref), race, and sex. Age_ref was matched on +-5 years. Sex and race were matched exactly. For the cases without matched controls, matching was done only on sex and race with 1:1 ratio. Since controls are fewer than cases, one control is matched on 2 cases at most. Health Professionals Follow-up Study (HPFS): A parallel prospective study to the NHS (Nurses' Health Study). The HPFS cohort comprised 51,529 men aged 40-75 who, in 1986, responded to a mailed questionnaire (Rimm et al., 1990) [PMID: 2090285]. Participants provided information on health related exposures, including current and past smoking history, age, weight, height, diet, physical activity, aspirin use, and family history of colorectal cancer. Colorectal cancer and other outcomes were reported by participants or next-of-kin and were followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical record review. Information was abstracted on histology and primary location. Incident cases were defined as those occurring after the subject provided the blood sample. Prevalent cases were defined as those occurring after enrollment in the study but before the subject provided the blood sample. Follow-up evaluation has been excellent, with 94% of the men responding to date. Colorectal cancer cases were ascertained through January 1, 2008. In 1993-1995, 18,825 men in the HPFS mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 13,956 men in the HPFS who had not provided a blood sample previously mailed in a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1986, but before the subject provided either a blood or buccal sample. After excluding participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were previously constructed. In addition to colorectal cancer cases and controls, a set of adenoma cases and matched controls with available DNA from buffy coat were selected for genotyping. Over the follow-up period, data were collected on endoscopic screening practices and, if individuals had been diagnosed with a polyp, the polyps were confirmed to be adenomatous by medical record review. Adenoma cases were ascertained through January 1, 2008. A separate case-control set was constructed of participants diagnosed with advanced adenoma matched to control participants who underwent a lower endoscopy in the same time period and did not have an adenoma. Advanced adenoma was defined as an adenoma 1 cm or larger in diameter and/or with tubulovillous, villous, or highgrade dysplasia/carcinoma-in-situ histology. Matching criteria included year of birth (within 1 year) and month/ year of blood sampling (within 6 months), the reason for their lower endoscopy (screening, family history, or symptoms), and the time period of any prior endoscopy (within 2 years). Controls matched to cases with a distal adenoma either had a negative sigmoidoscopy or colonoscopy examination, and controls matched to cases with proximal adenoma all had a negative colonoscopy. In total, 159 advanced adenoma cases and 109 controls were selected for genotyping. Leeds Colorectal Cancer Study (LCCS): Following local ethical approval, colorectal cancer cases were recruited from 1997 until 2012 in Leeds, UK through surgical clinics. Initially, funding was provided by the UK Ministry of Agriculture, Farming and Fisheries (subsequently the Food Standards Agency) and Imperial Cancer Research Fund (subsequently Cancer Research UK). Recruitment also occurred similarly in Dundee, Perth and York between the periods of 1997 and 2001 using the same protocol and the data and samples were combined. Pathologically confirmed cases were consented at outpatient clinics, providing information on known and postulated risk factors for colorectal cancer (diet, lifestyle and family history) as well as providing a blood sample for DNA. Exclusion criteria included pre-existing diverticular disease and an inability to complete the questionnaire. The General Practitioners of cases (all UK residents have a nominated General Practitioner to whom to refer initial medical queries) and these GPs were asked to send letters to other persons on their patient list of the same gender and born within 5 years of the case. Subsequently to enhance the number of controls, we systematically invited patients from selected GP practices. Diet was assessed in cases and controls using an extensive dietary and lifestyle questionnaire modified by that produced by the European Prospective Investigation in Cancer (EPIC). The frequency that each specific food items were eaten was recorded and we also obtained average fruit and vegetable consumption as a cross-check. In total, 1591 cases and 739 controls provided a DNA sample. The North Carolina Colon Cancer Studies (NCCCS I/II): The North Carolina Colon Cancer Studies (NCCCS I- colon and NCCCS II-rectal) were population-based case-control studies conducted in 33 counties of North Carolina. Cases were identified using the rapid case ascertainment system of the North Carolina Central Cancer Registry. Patients with a first diagnosis of histologically confirmed invasive adenocarcinoma of the colon (cecum through sigmoid colon) between October 1996 and September 2000 were classified as potential cases in the NCCCS I. The NCCCS II included patients with a first diagnosis of histologically confirmed invasive adenocarcinoma of the sigmoid colon, rectosigmoid, or rectum (hereafter collectively referred to as rectal cancer) between May 2001 and September 2006. Additional eligibility requirements were: aged 40-80 years, residence in one of the 33 counties, ability to give informed consent and complete an interview, had a driver's license or identification card issued by the North Carolina Department of Motor Vehicles (if under the age of 65), and had no objections from the primary physician in regards to contacting the individual. Controls, identified and sampled during the respective study dates, were selected from two sources. Potential controls under the age of 65 were identified using the North Carolina Department of Motor Vehicles records. For those 65 years and older, records from the Center for Medicare and Medicaid Services were used. Controls were matched to cases using randomized recruitment strategies. Recruitment probabilities were done using strata of 5-year age, sex, and race groups. Dietary information was collected using a modified version of the semiquantitative food frequency questionnaire developed at the National Cancer Institute. In addition, participants were asked about vitamin and mineral supplementation, special diets, restaurant eating, sodium use, and fats used in cooking. In NCCCS I, 515 colorectal cases and 687 matched controls were sent for genotyping. In NCCCS II, 796 colorectal cases and 823 controls were sent from the NCCCS II for genotyping. Controls were matched to CRC cases as 1:1 ratio. Matching was done on age, race, and sex. Age was matched on +-5 years. Race and sex was matched exactly. For the cases without matched controls, matching was done only on sex and race. Nurses Health Study (NHS): The NHS cohort began in 1976 when 121,700 married female registered nurses age 30-55 years returned the initial questionnaire that ascertained a variety of important health-related exposures (Belanger et al., 1978) [PMID: 248266]. Since 1976, follow-up questionnaires have been mailed every 2 years. Colorectal cancer and other outcomes were reported by participants or next-of-kin and followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical-record review. Information was abstracted on histology and primary location. The rate of follow-up evaluation has been high: as a proportion of the total possible follow-up time, follow-up evaluation has been more than 92%. Colorectal cancer cases were ascertained through June 1, 2008. In 1989 -1990, 32,826 women in NHS I mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 29,684 women in NHS I who did not previously provide a blood sample mailed a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1976 but before the subject provided either a blood or buccal sample. After excluding participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were previously constructed from which DNA was isolated from either buffy coat or buccal cells for genotyping. In addition to colorectal cancer cases and controls, a set of advanced adenoma cases and matched controls with available DNA from buffy coat were selected for genotyping. Over the follow-up period, data were collected on endoscopic screening practices and, if individuals had been diagnosed with a polyp, the polyps were confirmed to be adenomatous by medical record review. Adenoma cases were ascertained through June 1, 2011. A separate case-control set was constructed of participants diagnosed with advanced adenoma matched to control participants who underwent a lower endoscopy in the same time period and did not have an adenoma. Advanced adenoma was defined as an adenoma more than 1 cm in diameter and/or with tubulovillous, villous, or high-grade dysplasia/carcinoma-in-situ histology. Matching criteria included year of birth (within 1 year) and month/year of blood sampling (within 6 months), the reason for their lower endoscopy (screening, family history, or symptoms), and the time period of any prior endoscopy (within 2 years). Controls matched to cases with a distal adenoma either had a negative sigmoidoscopy or colonoscopy examination, and controls matched to cases with proximal adenoma all had a negative colonoscopy. A total of 272 cases and 236 matched controls were sent to CIDR for the advanced adenoma case-control set. Northern Swedish Health and Disease Study (NSHDS): Comprises over 110,000 participants, including approximately one third with repeated sampling occasions, from three population-based cohorts (Dahlin et al., 2010; Myte et al., 2016) [PMID: 20197478; PMID: 27367522]. The largest is the ongoing Vasterbotten Intervention Programme, in which all residents of Vasterbotten County are invited to a health examination upon turning 30 (some years), 40, 50 and 60 years of age. Extensive measured and self-reported health and lifestyle data, as well as blood samples for central biobanking in Umea, Sweden, are collected at the health exam. Leucocyte DNA samples for 1:1-matched CRC case-control sets from the NSHDS, of which 878 samples are included in this study, have been selected for genotyping. This is in addition to 354 samples from the NSHDS previously analyzed as part of the multicenter EPIC cohort. Cancer-specific and overall survival data are available for all patients. For at least 425 patients, archival tumor tissue has been analyzed for the BRAF V600E mutation and by sequencing codon 12 and 13 for KRAS mutations, as well as for MSI screening status by immunohistochemistry and for an eight-gene CIMP panel using quantitative real-time PCR (MethyLight). Ohio Colorectal Cancer Prevention Initiative (OCCPI, OSUMC): OCCPI (ClinicalTrials.gov identifier: NCT01850654) is a population-based study of colorectal cancer patients diagnosed in one of 51 hospitals throughout the state of Ohio from January 1, 2013 through December 31, 2016. The OCCPI was created to decrease CRC incidence in Ohio by identifying patients with hereditary predisposition (statewide universal tumor screening for newly diagnosed CRC patients), increase colonoscopy compliance for first-degree relatives of CRC patients, and encourage future research through the creation of a biorepository. The 51 Ohio hospitals participating in the OCCPI were selected to represent a cross-section of clinical centers in the state based on high reported volume of CRC patients, affiliation with a high volume hospital, or interest in participation. Institutional Review Board (IRB) approval was obtained by the individual hospitals, Community Oncology Programs, or by ceding review to the OSU IRB. Written informed consent was obtained. A total of 2139 colorectal cases were genotyped. Patients were considered eligible for this study if they were age 18 or older at the time of enrollment, if they had a surgical resection (or biopsy if unresectable) in the state of Ohio demonstrating an adenocarcinoma of the colorectum from 1/1/13 - 12/31/16. Matched control samples were selected from the Ohio State University Medical Center's (OSUMC) Human Genetics Sample Bank in an identical way to the selection for the Columbus-area HNPCC Study (please refer to the description for the Columbus-area HNPCC Study). Prostate, Lung, Colorectal and Ovarian Cancer Screening Trail (PLCO): PLCO enrolled 154,934 participants (men and women, aged between 55 and 74 years) at ten centers into a large, randomized, two-arm trial to determine the effectiveness of screening to reduce cancer mortality. Sequential blood samples were collected from participants assigned to the screening arm. Participation was 93% at the baseline blood draw. In the observational (control) arm, buccal cells were collected via mail using the "swish-and-spit" protocol and participation rate was 65%. Details of this study have been previously described (Huang et al., 2016) [PMID: 27673363] and are available online (http://dcp.cancer.gov/plco). For this study 1651 advanced adenoma cases and 1392 controls were selected for genotyping. Selenium and Vitamin E Prevention Trial (SELECT): The Selenium and Vitamin E Cancer Prevention Trial (SELECT) was a double-blind, placebo controlled clinical trial which explored using selenium and vitamin E alone and in combination to prevent prostate cancer in healthy men (Lippman et al., 2009) [PMID: 19066370]. Secondary endpoints included the prevention of colorectal and lung cancers. SELECT was conducted at 427 sites and centers in the United States, Canada and Puerto Rico; 35,533 men 55 years and older (50 or older if African American) were randomized beginning August 22, 2001. Supplementation was discontinued on October 23, 2008 due to futility. 308 colorectal cancer cases and 308 matched controls were selected from the SELECT population and sent for genotyping. Screening Markers For Colorectal Disease Study and Colonoscopy and Health Study (SMS-REACH): Details on this study population were previously reported (Burnett-Hartman et al., 2014) [PMID: 24875374]. Participants were enrollees in an integrated health-care delivery system in western Washington State (Group Health Cooperative, Seattle, Washington) aged 24-79 years who underwent an index colonoscopy for any indication between 1998 and 2007 and donated a buccal-cell or blood sample for genotyping analysis. Study recruitment took place in 2 phases, with phase 1 occurring in 1998-2003 and phase 2 occurring in 2004-2007. Persons who had undergone a colonoscopy less than 1 year prior to the index colonoscopy, persons with inadequate bowel preparation for the index colonoscopy, and persons with a prior or new diagnosis of colorectal cancer, a familial colorectal cancer syndrome (such as familial adenomatous polyposis), or another colorectal disease were ineligible. Patients diagnosed with adenomas or serrated polyps and persons who were polyp-free at the index colonoscopy (controls) were systematically recruited during both phases of recruitment. Approximately 75% agreed to participate and provided written informed consent. Based on medical records, persons who agreed to participate and those who refused study participation were similar with respect to age, sex, and colorectal polyp status. Study protocols were approved by the institutional review boards of the Group Health Cooperative and the Fred Hutchinson Cancer Research Center (Seattle, Washington). A total of 575 cases and 508 matched were selected for the study. Controls were matched to CRC cases as 1:1 ratio. Matching was done on age_ref, race, and sex. Age_ref was matched on +-5 years. The Women's Health Initiative (WHI): WHI is a long-term national health study that has focused on strategies for preventing heart disease, breast and colorectal cancer, and osteoporotic fractures in postmenopausal women. The original WHI study included 161,808 postmenopausal women enrolled between 1993 and 1998. The Fred Hutchinson Cancer Research Center in Seattle, WA serves as the WHI Clinical Coordinating Center for data collection, management, and analysis of the WHI. The WHI has two major parts: a partial factorial randomized Clinical Trial (CT) and an Observational Study (OS); both were conducted at 40 Clinical Centers nationwide. The CT enrolled 68,132 postmenopausal women between the ages of 50-79 into trials testing three prevention strategies. If eligible, women could choose to enroll in one, two, or all three of the trial components. The components are: Hormone Therapy Trials (HT): This double-blind component examined the effects of combined hormones or estrogen alone on the prevention of coronary heart disease and osteoporotic fractures, and associated risk for breast cancer. Women participating in this component with an intact uterus were randomized to estrogen plus progestin (conjugated equine estrogens [CEE], 0.625 mg/d plus medroxyprogesterone acetate [MPA] 2.5 mg/d] or a matching placebo. Women with prior hysterectomy were randomized to CEE or placebo. Both trials were stopped early, in July 2002 and March 2004, respectively, based on adverse effects. All HT participants continued to be followed without intervention until close-out. Dietary Modification Trial (DM): The Dietary Modification component evaluated the effect of a low-fat and high fruit, vegetable and grain diet on the prevention of breast and colorectal cancers and coronary heart disease. Study participants were randomized to either their usual eating pattern or a low-fat dietary pattern. Calcium/Vitamin D Trial (CaD): This double-blind component began 1 to 2 years after a woman joined one or both of the other clinical trial components. It evaluated the effect of calcium and vitamin D supplementation on the prevention of osteoporotic fractures and colorectal cancer. Women in this component were randomized to calcium (1000 mg/d) and vitamin D (400 IU/d) supplements or a matching placebo. The Observational Study (OS)examines the relationship between lifestyle, environmental, medical and molecular risk factors and specific measures of health or disease outcomes. This component involves tracking the medical history and health habits of 93,676 women not participating in the CT. Recruitment for the observational study was completed in 1998 and participants were followed annually for 8 to 12 years. All centrally confirmed cases of invasive colorectal cancers, or deaths from colorectal cancer were selected as potential cases from September 30, 2015 database. Controls were participants free of colorectal cancer (invasive or in situ) as of September 30, 2015. Potential cases and controls were excluded if they (1) were non-White; (2) had history of colorectal cancers at baseline; (3) lost to follow-up after enrollment; (4) DbGAP ineligible; (5) had <1.25ug of DNA; (6) selected for WHI study M26 Phase I or II; (7) selected for WHI study AS224 and also included in the imputation project. A total of 578 cases and 104,429 controls met the eligibility criteria. Each case was matched with 1 control (1:1) that exactly met the following matching criteria: age (+-5 years), 40 randomization centers (exact), WHI date (+-3 years), CaD date (+-3 years), OS flag (exact), HRT assignments (exact), DM assignments (exact), and CaD assignments (exact). Control selection was done in a time-forward manner, selecting one control for each case from the risk set at the time of the case's event. The matching algorithm was allowed to select the closest match based on a criteria to minimize an overall distance measure (Bergstralh EJ, Kosanke JL. Computerized matching of cases to controls. Technical Report #56, Department of Health Sciences Research, Mayo Clinic, Rochester MN. April 1995). Each matching factor was given the same weight. When exact matches could not be found, the matching criteria were gradually relaxed among unmatched cases and controls until all cases had found matched controls. Using the matching criteria specified above, 559 of the 578 eligible cases found exact matches. The matching criteria was then relaxed to : Age+-5, randomization centers, WHI date +- 3 years, CaD date +- 3 years, OS flag, HRT flag, DM flag, CaD flag. 17 of the remaining 19 unmatched cases found matched controls. By matching on Age+-5, randomization centers, WHI date +- 3 years, CaD date +- 3 years, OS flag, HRT flag, the remaining 2 unmatched cases found their matches.
Study
phs001415
Tourette International Collaborative Genetics (TIC Genetics) Study - NJCTS and NIMH
The Tourette International Collaborative Genetics (TIC Genetics) Study is an international collaboration of scientists and clinicians specialized in Tourette Disorder (TD) from more than 20 sites across the United States, Europe, and South Korea. The study was established to further our understanding of the genetic architecture of tic disorders by developing a large sample of genotypically and phenotypically well-characterized affected probands and their relatives. We employ state-of-the-art genetic technologies to identify major genetic variants contributing to TD and the most commonly comorbid disorders, such as Obsessive-Compulsive Disorder (OCD) and Attention-Deficit/Hyperactivity Disorder (ADHD). TIC Genetics is a direct result of work of the New Jersey Center for Tourette Syndrome (NJCTS) Sharing Repository (Heiman et. al., 2008; PMID: 19036136), funded by a grant from NJCTS Center of Excellence. Established in 2011 (Dietrich et. al., 2015; PMID: 24771252), the TIC Genetic study focuses on both on familial genetic variants with large effects within multiplex affected pedigrees and on de novo mutations ascertained through the analysis of apparently simplex parent-child trios with non-familial tics. In May 2017, we published a whole-exome sequencing study on apparently 311 parent-child trios (Willsey et. al., 2017; PMID: 28472652). These data, both phenotypes and sequencing data, are available through dbGaP. There were 120 subject samples included in the publication that did not have consent for sharing. These are excluded from dbGaP.In November 2021, we published a whole-exome sequencing study on 13 multiplex TD families (Cao et. al., 2021). These data, both phenotypes and sequencing data, are available through dbGaP.
Study
phs001423
Colon Cancer Family Registry (Colon CFR)
The Colon Cancer Family Registry Cohort (CCFRC www.coloncfr.org) is a cohort of families recruited through six study sites located in the USA, Canada, Australia, and New Zealand. The CCFRC was formed as a resource to support studies on the etiology, prevention, and clinical management of colorectal cancer. Recruitment protocols fall broadly into two main categories: population-based and clinic-based ascertainment. Between 1998 and 2013, the CCFR recruited and interviewed probands who were either recently diagnosed with colorectal cancer that was reported to state or regional population cancer registries in the United States (Washington, California, Arizona, Minnesota, Colorado, New Hampshire, North Carolina, and Hawaii), Australia (Victoria), and Canada (Ontario) or were from multiple-case families referred to family-cancer clinics in the United States (Mayo Clinic in Rochester, Minnesota, and Cleveland Clinic in Cleveland, Ohio), Australia (Melbourne, Adelaide, Perth, Brisbane, and Sydney), New Zealand (Auckland), and Canada. Cases with known familial adenomatous polyposis were excluded. Once recruited, probands were asked for permission to contact their relatives for recruitment. Collectively, we have enrolled 42,49874 participants in 15,048 families who participated in standardized protocols used to collect information regarding family cancer history and colorectal cancer risk factors and biospecimens. All participants of population-based case families (excluding control-families) and clinic-based families are asked to provide updates on their personal and family history of cancer as well as their history of surgery, cancer screening, and some risk factors every 4-5 years either by telephone interview or by self-completed questionnaire via mail or online (PMID: 17982118).
Study
phs002733
NSIGHT North Carolina Newborn Exome Sequencing for Universal Screening (NC NEXUS)
Since newborn screening (NBS) began in the 1960s, technological advances have resulted in its use in an increasing number of disorders. Recent developments in whole-genome sequencing and its simpler corollary, whole-exome sequencing (WES), now afford the opportunity to comprehensively define the variation within an individual's genome in a rapid and affordable manner. Many challenges arise with the clinical application of genome-scale sequencing and in deriving practical benefits to infants and children. Its utility in NBS has yet to be demonstrated and its application in the pediatric population requires special examination, not only for potential clinical benefits but also for the unique ethical challenges it presents. In this proposal, we outline a highly interdisciplinary approach to identify, confront, and overcome the major challenges that must be met in order to implement deep sequencing technology to enhance current newborn screening in a diverse pediatric population. Overarching Aim 1 will evaluate the utility of WES as a diagnostic tool to extend the utility of current NBS. Using diverse cohorts of infants and young children with known conditions identified through NBS, we will examine the sensitivity and specificity of WES. We will also utilize WES in cohorts of children with known conditions not currently screened as potential candidates for NBS in the future. Overarching Aim 2 will develop and assess a framework for analyzing WES in a clinically oriented framework based on principles of ethics and evidence-based medicine. We will develop strategies to guide clinicians, clinical laboratories, and patients/families in their decisions regarding the inevitable incidental findings that will be detected, in ways that respect the child and protect his/her future autonomy, while also respecting parental interests and rights. Overarching Aim 3 will explore ethical, legal, and social issues (ELSI) involved in informed decision making and develop best practices regarding the return of results after testing. We will develop novel decision support tools and evaluate their usefulness in parental decision making, and examine the burdens placed on clinicians as this new technology is deployed in the vulnerable and special population that are newborns and their families.
Study
phs002095
Epidemiological study comparing rates and risk factors for dementia in African Americans in Indianapolis and Yoruba living in Ibadan, Nigeria
In 1991 collaboration between researchers at Indiana University School of Medicine and the University of Ibadan, Ibadan, Nigeria established the Indianapolis-Ibadan Dementia Project. It is a longitudinal, prospective population-based comparative epidemiological study of the prevalence and incidence rates and risk factors of Alzheimer's disease and other age associated dementias. The project compares samples of community-dwelling elderly (age > 70 years) African Americans living in Indianapolis to Yoruba living in Ibadan, Nigeria, employing the same research design, methods, and investigators. It initially reported significantly lower prevalence rates of disorders in the Yoruba compared to the African Americans. In subsequent waves of the study (1994-1995, 1997-1998) incidence rates, rates of newly diagnosed cases, were also found to be significantly lower in the Yoruba. In genetic studies, the frequency of the APOE 4 allele was about the same in the two groups. APOE 4 was a significant risk factor for Alzheimer's disease and dementia in the Americans, while no association was found for the Yoruba. The APOE 2 allele appears to be protective in the Americans, but not the Yoruba. A constellation of factors often associated with vascular risk including a history of hypertension, diabetes, and high cholesterol levels is less common in the Yoruba than in the American group. An interaction was observed between cholesterol, APOE genotype and Alzheimer's disease in both study groups. In 2001-2002 survivors of the original cohort were once again evaluated (N~800 in each site) and 2,000 additional individuals age 70 years and older were enrolled in each site. Blood samples were collected from approximately 1,500 study participants in each site for genetic studies and analysis of biochemical risk factors for vascular disease. Subsequent waves of field work were conducted in 2004, 2007, 2009, and 2011. This fieldwork followed the classic two-stage study design. The study design involves an in-home screening interview with the study participant, which includes a cognitive assessment, medical history and current medications, brief neurological examination, height and weight, blood pressure measurement and assessment of social involvement. There is also a screening interview with a close relative of the study participant to assess activities of daily living, personality change, and medical history of the study participant. On the basis of the screening interview a sample of study participants (N~500 in each site) is selected for a full clinical diagnostic dementia work up which includes a neurological test battery, extensive interview with a family member and examination by a clinician. Diagnoses are made in a consensus diagnosis conference using the criteria of the Diagnostic and Statistical Manual of Mental Disorders, Revised Third Edition (DSM-III R) and International Classification of Diseases 10th Revision (ICD-10) for dementia. The criteria of the National Institute for Neurological and Communicative Disease and Stroke-Alzheimer's Disease and Related Disorders Association were used to diagnose possible and probable Alzheimer's disease. The focus of the study is risk factors for dementia and Alzheimer's disease, but also of particular importance is the question of mild cognitive impairment. This refers to the condition of having some decline in cognition but the decline is not sufficient to meet the criteria for dementia. We have studied this over the course of this project. In follow up studies of individuals who have this diagnosis about one third of them are better at follow up, about a third are about the same, and about a third decline more to meet the criteria for dementia. This is a very important issue for researchers because the ultimate goal of the research is to figure out how to identify the individuals who will definitely progress to dementia. If there are clear identifiers, it would be possible to make interventions, while individuals still function well, and possibly prevent the development of dementia altogether or delay the onset significantly. This is crucial because at the moment individuals usually do not enter into the medical care system until the dementia symptoms are quite severe, and the pathological damage to the brain cannot be undone.
Study
phs000378
Add Health: Longitudinal Study of a Nationally Representative Sample of Adolescents in Grades 7-12 in the United States during the 1994-95 School Year, Followed into Adulthood with Five Interviews/Surveys in 1995, 1996, 2001-02, 2008, and 2016-18
The National Longitudinal Study of Adolescent to Adult Health [Add Health] is an ongoing longitudinal study of a nationally representative U.S. cohort of more than 20,000 adolescents in grades 7-12 (aged 12-19 years) in 1994 followed into adulthood with five interviews/surveys in 1995, 1996, 2001-02, 2008, and 2016-18. Add Health was designed to understand how social environments and behaviors in adolescence are linked to health and achievement outcomes in young adulthood. Add Health contains unprecedented environmental, behavioral, psychosocial, biological, and genetic data from early adolescence and into adulthood on a large, nationally representative cohort with unprecedented racial, ethnic, socioeconomic, and geographic diversity. Add Health has a large, multidisciplinary user base of over 50,000 researchers around the world who have published over 3,400 research articles. Add Health is housed at the Carolina Population Center of the University of North Carolina at Chapel Hill. Add Health datasets are distributed according to a tiered data disclosure plan designed to protect the data from the risk of direct and indirect disclosure of respondent identity. Add Health's large sample size, population diversity and rich longitudinal database of psychosocial, physical, and contextual data will permit investigation of an exceptionally broad range of phenotypes with known genetic variation. Prospective longitudinal measures are available to document change over time in each of these phenotypes, as well as change in the social environment and life experiences, making the Add Health sample ideal for understanding genetic linkages with health and behavior across the life course. The original design of Add Health included important features for understanding biological processes in health and developmental trajectories across the life course of young people, including an embedded genetic sample with more than 3,000 pairs of adolescents with varying biological resemblance (e.g., twins, full sibs, half sibs, and adolescents who grew up in the same household but have no biological relationship), testing of saliva and urine for sexually transmitted infections and HIV, and biomarkers of cardiovascular health, metabolic processes, immune function, renal function, and inflammation. Add Health therefore has critical objective indicators of health status and disease markers in young adulthood, well before chronic illness or its complications emerge in later adulthood. Because DNA has been collected on the full sample at Wave IV, it is possible to link genetic profiles with social, behavioral, and biological measures over time from adolescence into adulthood. Add Health sampled the multiple environments in which young people live their lives, including the family, peers, school, neighborhood, community, and relationship dyads, and provides independent and direct measurement of these environments over time. Add Health contains extensive longitudinal information on health-related behavior, including life histories of physical activity, involvement in risk behavior, substance use, sexual behavior, civic engagement, education, and multiple indicators of health status based on self-report (e.g., general health, chronic illness), direct measurement (e.g., overweight status and obesity), and biomarkers. No other data resource with this expanse of genotype and phenotype data on a large nationally representative longitudinal sample with race, ethnic, socioeconomic, and geographic diversity exists. A complete reference guide on study design and accomplishments can be found on the Add Health website: https://cdr.lib.unc.edu/concern/articles/6t053j27s.
Study
phs001367
The African American Breast Cancer Epidemiology and Risk (AMBER) Consortium Study
The AMBER Consortium Study was formed to pool interview data, questionnaire data, and biological samples from epidemiological studies of breast cancer in African-American women to discover the potential causes of early-onset and aggressive breast cancer in African-American women. AMBER is funded through a Program Project grant from the National Cancer Institute. Genetic data submitted to dbGaP come from participants in the Carolina Breast Cancer Study, Women's Circle of Health Study, and Black Women's Health Study. The P01 consists of four scientific projects; the aims include follow-up on previous GWAS findings for breast cancer susceptibility in AA women as well as investigation of SNPs in candidate genes in biologically plausible pathways. These SNPs were genotyped using DNA from 3130 African-American women with breast cancer and 3700 controls. Descriptions of the original studies that provided the data and samples for this collaborative study are given below. The Carolina Breast Cancer Study (CBCS): a North Carolina population-based case-control study of breast cancer, conducted in three phases. The current study phase, phase 3 (years 2008-2014), includes women residents in 44 counties. CBCS phases 1 and 2 were conducted in 24 counties. Breast cancer cases are identified using Rapid Case Ascertainment in cooperation with the NC Central Cancer Registry. Controls were identified for phases 1 and 2 only (1993-1996 and 1996-2001), using Division of Motor Vehicles lists for women under age 65 and Health Care Financing Administration lists for women 65 and older. Randomized recruitment was used to oversample AA women and women under age 50. In-depth interviews are conducted by study nurses in participants' homes to obtain information on potential risk factors for breast cancer. DNA samples have been obtained from most participants. Overall response rates for Phases 1 and 2 were 74% for AA cases and 54% for AA controls. Phase 3, conducted in 44 counties from 2008-2014, includes cases only. The response rate for AA cases in Phase 3 was 70.5%. The Women's Circle of Health Study (WCHS): a multi-site case-control study in New York City (NYC) and New Jersey (NJ) aimed at evaluating risk factors for early and aggressive breast cancer in women of AA and EA ancestry. Recruitment in NYC took place between January 2002 and December 2008 and involved hospital-based ascertainment of cases, while controls were identified through random digit dialing (RDD). Recruitment at the NJ site started in March 2006 and is ongoing. Phase I of the study ended in April 2012 and covered seven counties in NJ. WCHS2 includes two additional counties. Cases in NJ were identified from 2006 to 2012 by the NJ State Cancer Registry using rapid case ascertainment. Controls were initially recruited though RDD (2006 to 2010) and later through community-based efforts (2009-2012). In-person interviews ascertained data on established and suspected risk factors for breast cancer. DNA samples were obtained. Among eligible AA women, 75% in NY and 54% in NJ completed an interview and provided a biologic specimen. Black Women's Health Study (BWHS): an ongoing prospective cohort study of health and illness among U.S. black women, with a focus on cancer. The study began in 1995 when 59,000 AA women 21-69 years of age from across the United States completed a 14-page postal health questionnaire. The median age at entry was 38, and participants were residents of 17 states in mainland U.S.: Northeast, 28%; South, 30%; Midwest, 23%; West, 19%. The baseline questionnaire elicited information on a wide range of variables, including demographic factors, use of medical care, family history of breast cancer, reproductive and medical history, cigarette and alcohol use, weight, height, waist and hip circumference, medication use, diet, and exercise. Biennial follow-up questionnaires ascertain new cases of breast cancer and other illnesses and update covariate information. Medical record and cancer registry data are sought for all participants who report a diagnosis of breast cancer. As of 2014, approximately 80% of the baseline cohort have completed follow-up. DNA samples were obtained from about 50% of participants. BWHS data for the AMBER consortium were prepared as a nested case-control study, with controls frequency-matched to cases on year of birth, geographic region, and most recent questionnaire completed prior to the end of the at-risk period.
Study
phs000669
Comprehensive benchmarking of methods for mutation calling in circulating tumor DNA
This study provides a comprehensive benchmarking resource for somatic variant detection in cell-free DNA (cfDNA) from cancer patients. Longitudinal plasma samples from colorectal and breast cancer cohorts were selected to create patient-matched dilution series spanning ultra-low to high circulating-tumour-DNA (ctDNA) fractions, while preserving each individual’s germline and clonal haematopoiesis background. Deep whole-genome sequencing (150×) and ultra-deep whole-exome sequencing (2,000×) generated a reference call set of ~37,000 single-nucleotide variants and ~58,000 insertions/deletions. These data enabled systematic evaluation of nine somatic variant callers across variable ctDNA levels and sequencing depths, and were further used to explore machine-learning–guided parameter tuning. The resulting dataset offers an openly accessible framework for developers and clinicians to assess and optimize somatic variant calling in liquid biopsy applications.
Study
EGAS50000001313
Clinical Sequencing Exploratory Research (CSER): Clinical sequencing in cancer: Clinical, ethical, and technological studies
The Clinical Sequencing Exploratory Research (CSER) program supports multi-disciplinary projects that bring together clinicians, bioinformaticians, and ethicists to research the challenges of utilizing genomic sequence data in the clinic in the routine practice of medicine. The challenges are many and not disease-specific. Important aims of the research include: development of technical specifications and standards for sequencing in a clinical setting, investigation of methods to transmit genome-scale data to physicians in a fashion and timescale that fits the normal clinical workflow, exploration of regulatory requirements for applying genomic sequence data to patient care, study of physician and patient preferences regarding presentation of genomic information, and study of the ethical implications of returning unanticipated findings. More information about CSER and the investigators and institutions who comprise the CSER consortium can be found at http://www.genome.gov/27546194. The NEXT (New Exome Technology) Medicine Study utilizes a randomized controlled trial (RCT) structure to compare usual care (UC) practice in Medical Genetics Clinics with the introduction of a powerful research tool of whole exome sequencing plus UC practices for Colorectal Cancer and Polyposis (CRCP) in adults. In doing so, the study aims to explore the practical, economic, and ethical implications of identifying and returning incidental findings ("extra" whole exome genetic risk results not associated with CRCP that includes pharmacogenetic variants) to patients. The study will also attempt to identify novel causal genes for CRCP by studying relatives of selected research subjects.
Study
phs000999
Immune signature of malignant melanoma in pregnancy
Melanoma is a highly immunogenic malignancy and there is a deep-rooted notion among clinicians that a probable systemic immunosuppressed state during pregnancy would negatively influence melanoma related prognosis. To test this hypothesis on a more fundamental level, we collected a unique set of melanoma tumor samples and performed RNAseq to draw comparisons between the immune cell infiltrate of pregnant patients and non-pregnant matched controls. We found that there is indeed no differential expression of any immune cell subset. This underscores the notion that the supposed immunosuppressive state of pregnancy does not hamper the anti-tumor immune response.
Study
EGAS50000000492
OncoArray: Prostate Cancer
Original description of the study: From ELLIPSE (linked to the PRACTICAL consortium), we contributed ~78,000 SNPs to the OncoArray. A large fraction of the content was derived from the GWAS meta-analyses in European ancestry populations (overall and aggressive disease; ~27K SNPs). We also selected just over 10,000 SNPs from the meta-analyses in the non-European populations, with a majority of these SNPs coming from the analysis of overall prostate cancer in African ancestry populations as well as from the multiethnic meta-analysis. A substantial fraction of SNPs (~28,000) were also selected for fine-mapping of 53 loci not included in the common fine-mapping regions (tagging at r2>0.9 across ±500kb regions). We also selected a few thousand SNPs related with PSA levels and/or disease survival as well as SNPs from candidate lists provided by study collaborators, as well as from meta-analyses of exome SNP chip data from the Multiethnic Cohort and UK studies. The Contributing Studies: Aarhus: Hospital-based, Retrospective, Observational. Source of cases: Patients treated for prostate adenocarcinoma at Department of Urology, Aarhus University Hospital, Skejby (Aarhus, Denmark). Source of controls: Age-matched males treated for myocardial infarction or undergoing coronary angioplasty, but with no prostate cancer diagnosis based on information retrieved from the Danish Cancer Register and the Danish Cause of Death Register. AHS: Nested case-control study within prospective cohort. Source of cases: linkage to cancer registries in study states. Source of controls: matched controls from cohort ATBC: Prospective, nested case-control. Source of cases: Finnish male smokers aged 50-69 years at baseline. Source of controls: Finnish male smokers aged 50-69 years at baseline BioVu: Cases identified in a biobank linked to electronic health records. Source of cases: A total of 214 cases were identified in the VUMC de-identified electronic health records database (the Synthetic Derivative) and shipped to USC for genotyping in April 2014. The following criteria were used to identify cases: Age 18 or greater; male; African Americans (Black) only. Note that African ancestry is not self-identified, it is administratively or third-party assigned (which has been shown to be highly correlated with genetic ancestry for African Americans in BioVU; see references). Source of controls: Controls were identified in the de-identified electronic health record. Unfortunately, they were not age matched to the cases, and therefore cannot be used for this study. Canary PASS: Prospective, Multi-site, Observational Active Surveillance Study. Source of cases: clinic based from Beth Israel Deaconness Medical Center, Eastern Virginia Medical School, University of California at San Francisco, University of Texas Health Sciences Center San Antonio, University of Washington, VA Puget Sound. Source of controls: N/A CCI: Case series, Hospital-based. Source of cases: Cases identified through clinics at the Cross Cancer Institute. Source of controls: N/A CerePP French Prostate Cancer Case-Control Study (ProGene): Case-Control, Prospective, Observational, Hospital-based. Source of cases: Patients, treated in French departments of Urology, who had histologically confirmed prostate cancer. Source of controls: Controls were recruited as participating in a systematic health screening program and found unaffected (normal digital rectal examination and total PSA < 4 ng/ml, or negative biopsy if PSA > 4 ng/ml). COH: hospital-based cases and controls from outside. Source of cases: Consented prostate cancer cases at City of Hope. Source of controls: Consented unaffected males that were part of other studies where they consented to have their DNA used for other research studies. COSM: Population-based cohort. Source of cases: General population. Source of controls: General population CPCS1: Case-control - Denmark. Source of cases: Hospital referrals. Source of controls: Copenhagen General Population Study CPCS2: Source of cases: Hospital referrals. Source of controls: Copenhagen General Population Study CPDR: Retrospective cohort. Source of cases: Walter Reed National Military Medical Center. Source of controls: Walter Reed National Military Medical Center ACS_CPS-II: Nested case-control derived from a prospective cohort study. Source of cases: Identified through self-report on follow-up questionnaires and verified through medical records or cancer registries, identified through cancer registries or the National Death Index (with prostate cancer as the primary cause of death). Source of controls: Cohort participants who were cancer-free at the time of diagnosis of the matched case, also matched on age (±6 mo) and date of biospecimen donation (±6 mo). EPIC: Case-control - Germany, Greece, Italy, Netherlands, Spain, Sweden, UK. Source of cases: Identified through record linkage with population-based cancer registries in Italy, the Netherlands, Spain, Sweden and UK. In Germany and Greece, follow-up is active and achieved through checks of insurance records and cancer and pathology registries as well as via self-reported questionnaires; self-reported incident cancers are verified through medical records. Source of controls: Cohort participants without a diagnosis of cancer EPICAP: Case-control, Population-based, ages less than 75 years at diagnosis, Hérault, France. Source of cases: Prostate cancer cases in all public hospitals and private urology clinics of département of Hérault in France. Cases validation by the Hérault Cancer Registry. Source of controls: Population-based controls, frequency age matched (5-year groups). Quotas by socio-economic status (SES) in order to obtain a distribution by SES among controls identical to the SES distribution among general population men, conditionally to age. ERSPC: Population-based randomized trial. Source of cases: Men with PrCa from screening arm ERSPC Rotterdam. Source of controls: Men without PrCa from screening arm ERSPC Rotterdam ESTHER: Case-control, Prospective, Observational, Population-based. Source of cases: Prostate cancer cases in all hospitals in the state of Saarland, from 2001-2003. Source of controls: Random sample of participants from routine health check-up in Saarland, in 2000-2002 FHCRC: Population-based, case-control, ages 35-74 years at diagnosis, King County, WA, USA. Source of cases: Identified through the Seattle-Puget Sound SEER cancer registry. Source of controls: Randomly selected, age-frequency matched residents from the same county as cases Gene-PARE: Hospital-based. Source of cases: Patients that received radiotherapy for treatment of prostate cancer. Source of controls: n/a Hamburg-Zagreb: Hospital-based, Prospective. Source of cases: Prostate cancer cases seen at the Department of Oncology, University Hospital Center Zagreb, Croatia. Source of controls: Population-based (Croatia), healthy men, older than 50, with no medical record of cancer, and no family history of cancer (1st & 2nd degree relatives) HPFS: Nested case-control. Source of cases: Participants of the HPFS cohort. Source of controls: Participants of the HPFS cohort IMPACT: Observational. Source of cases: Carriers and non-carriers (with a known mutation in the family) of the BRCA1 and BRCA2 genes, aged between 40 and 69, who are undergoing prostate screening with annual PSA testing. This cohort has been diagnosed with prostate cancer during the study. Source of controls: Carriers and non-carriers (with a known mutation in the family) of the BRCA1 and BRCA2 genes, aged between 40 and 69, who are undergoing prostate screening with annual PSA testing. This cohort has not been diagnosed with prostate cancer during the study. IPO-Porto: Hospital-based. Source of cases: Early onset and/or familial prostate cancer. Source of controls: Blood donors Karuprostate: Case-control, Retrospective, Population-based. Source of cases: From FWI (Guadeloupe): 237 consecutive incident patients with histologically confirmed prostate cancer attending public and private urology clinics; From Democratic Republic of Congo: 148 consecutive incident patients with histologically confirmed prostate cancer attending the University Clinic of Kinshasa. Source of controls: From FWI (Guadeloupe): 277 controls recruited from men participating in a free systematic health screening program open to the general population; From Democratic Republic of Congo: 134 controls recruited from subjects attending the University Clinic of Kinshasa KULEUVEN: Hospital-based, Prospective, Observational. Source of cases: Prostate cancer cases recruited at the University Hospital Leuven. Source of controls: Healthy males with no history of prostate cancer recruited at the University Hospitals, Leuven. LAAPC: Subjects were participants in a population-based case-control study of aggressive prostate cancer conducted in Los Angeles County. Cases were identified through the Los Angeles County Cancer Surveillance Program rapid case ascertainment system. Eligible cases included African American, Hispanic, and non-Hispanic White men diagnosed with a first primary prostate cancer between January 1, 1999 and December 31, 2003. Eligible cases also had (a) prostatectomy with documented tumor extension outside the prostate, (b) metastatic prostate cancer in sites other than prostate, (c) needle biopsy of the prostate with Gleason grade ≥8, or (d) needle biopsy with Gleason grade 7 and tumor in more than two thirds of the biopsy cores. Eligible controls were men never diagnosed with prostate cancer, living in the same neighborhood as a case, and were frequency matched to cases on age (± 5 y) and race/ethnicity. Controls were identified by a neighborhood walk algorithm, which proceeds through an obligatory sequence of adjacent houses or residential units beginning at a specific residence that has a specific geographic relationship to the residence where the case lived at diagnosis. Malaysia: Case-control. Source of cases: Patients attended the outpatient urology or uro-onco clinic at University Malaya Medical Center. Source of controls: Population-based, age matched (5-year groups), ascertained through electoral register, Subang Jaya, Selangor, Malaysia MCC-Spain: Case-control. Source of cases: Identified through the urology departments of the participating hospitals. Source of controls: Population-based, frequency age and region matched, ascertained through the rosters of the primary health care centers MCCS: Nested case-control, Melbourne, Victoria. Source of cases: Identified by linkage to the Victorian Cancer Registry. Source of controls: Cohort participants without a diagnosis of cancer MD Anderson: Participants in this study were identified from epidemiological prostate cancer studies conducted at the University of Texas MD Anderson Cancer Center in the Houston Metropolitan area. Cases were accrued in the Houston Medical Center and were not restricted with respect to Gleason score, stage or PSA. Controls were identified via random-digit-dialing or among hospital visitors and they were frequency matched to cases on age and race. Lifestyle, demographic, and family history data were collected using a standardized questionnaire. MDACC_AS: A prospective cohort study. Source of cases: Men with clinically organ-confined prostate cancer meeting eligibility criteria for a prospective cohort study of active surveillance at MD Anderson Cancer Center. Source of controls: N/A MEC: The Multiethnic Cohort (MEC) is comprised of over 215,000 men and women recruited from Hawaii and the Los Angeles area between 1993 and 1996. Between 1995 and 2006, over 65,000 blood samples were collected from participants for genetic analyses. To identify incident cancer cases, the MEC was cross-linked with the population-based Surveillance, Epidemiology and End Results (SEER) registries in California and Hawaii, and unaffected cohort participants with blood samples were selected as controls MIAMI (WFPCS): Prostate cancer cases and controls were recruited from the Departments of Urology and Internal Medicine of the Wake Forest University School of Medicine using sequential patient populations as described previously (PMID:15342424). All study subjects received a detailed description of the study protocol and signed their informed consent, as approved by the medical center's Institutional Review Board. The general eligibility criteria were (i) able to comprehend informed consent and (ii) without previously diagnosed cancer. The exclusion criteria were (i) clinical diagnosis of autoimmune diseases; (ii) chronic inflammatory conditions; and (iii) infections within the past 6 weeks. Blood samples were collected from all subjects. MOFFITT: Hospital-based. Source of cases: clinic based from Moffitt Cancer Center. Source of controls: Moffitt Cancer Center affiliated Lifetime cancer screening center NMHS: Case-control, clinic based, Nashville TN. Source of cases: All urology clinics in Nashville, TN. Source of controls: Men without prostate cancer at prostate biopsy. PCaP: The North Carolina-Louisiana Prostate Cancer Project (PCaP) is a multidisciplinary population-based case-only study designed to address racial differences in prostate cancer through a comprehensive evaluation of social, individual and tumor level influences on prostate cancer aggressiveness. PCaP enrolled approximately equal numbers of African Americans and Caucasian Americans with newly-diagnosed prostate cancer from North Carolina (42 counties) and Louisiana (30 parishes) identified through state tumor registries. African American PCaP subjects with DNA, who agreed to future use of specimens for research, participated in OncoArray analysis. PCMUS: Case-control - Sofia, Bulgaria. Source of cases: Patients of Clinic of Urology, Alexandrovska University Hospital, Sofia, Bulgaria, PrCa histopathologically confirmed. Source of controls: 72 patients with verified BPH and PSA<3,5; 78 healthy controls from the MMC Biobank, no history of PrCa PHS: Nested case-control. Source of cases: Participants of the PHS1 trial/cohort. Source of controls: Participants of the PHS1 trial/cohort PLCO: Nested case-control. Source of cases: Men with a confirmed diagnosis of prostate cancer from the PLCO Cancer Screening Trial. Source of controls: Controls were men enrolled in the PLCO Cancer Screening Trial without a diagnosis of cancer at the time of case ascertainment. Poland: Case-control. Source of cases: men with unselected prostate cancer, diagnosed in north-western Poland at the University Hospital in Szczecin. Source of controls: cancer-free men from the same population, taken from the healthy adult patients of family doctors in the Szczecin region PROCAP: Population-based, Retrospective, Observational. Source of cases: Cases were ascertained from the National Prostate Cancer Register of Sweden Follow-Up Study, a retrospective nationwide cohort study of patients with localized prostate cancer. Source of controls: Controls were selected among men referred for PSA testing in laboratories in Stockholm County, Sweden, between 2010 and 2012. PROGReSS: Hospital-based, Prospective, Observational. Source of cases: Prostate cancer cases from the Hospital Clínico Universitario de Santiago de Compostela, Galicia, Spain. Source of controls: Cancer-free men from the same population ProMPT: A study to collect samples and data from subjects with and without prostate cancer. Retrospective, Experimental. Source of cases: Subjects attending outpatient clinics in hospitals. Source of controls: Subjects attending outpatient clinics in hospitals ProtecT: Trial of treatment. Samples taken from subjects invited for PSA testing from the community at nine centers across United Kingdom. Source of cases: Subjects who have a proven diagnosis of prostate cancer following testing. Source of controls: Identified through invitation of subjects in the community. PROtEuS: Case-control, population-based. Source of cases: All new histologically-confirmed cases, aged less or equal to 75 years, diagnosed between 2005 and 2009, actively ascertained across Montreal French hospitals. Source of controls: Randomly selected from the Provincial electoral list of French-speaking men between 2005 and 2009, from the same area of residence as cases and frequency-matched on age. QLD: Case-control. Source of cases: A longitudinal cohort study (Prostate Cancer Supportive Care and Patient Outcomes Project: ProsCan) conducted in Queensland, through which men newly diagnosed with prostate cancer from 26 private practices and 10 public hospitals were directly referred to ProsCan at the time of diagnosis by their treating clinician (age range 43-88 years). All cases had histopathologically confirmed prostate cancer, following presentation with an abnormal serum PSA and/or lower urinary tract symptoms. Source of controls: Controls comprised healthy male blood donors with no personal history of prostate cancer, recruited through (i) the Australian Red Cross Blood Services in Brisbane (age range 19-76 years) and (ii) the Australian Electoral Commission (AEC) (age and post-code/ area matched to ProsCan, age range 54-90 years). RAPPER: Multi-centre, hospital based blood sample collection study in patients enrolled in clinical trials with prospective collection of radiotherapy toxicity data. Source of cases: Prostate cancer patients enrolled in radiotherapy trials: CHHiP, RT01, Dose Escalation, RADICALS, Pelvic IMRT, PIVOTAL. Source of controls: N/A SABOR: Prostate Cancer Screening Cohort. Source of cases: Men >45 yrs of age participating in annual PSA screening. Source of controls: Males participating in annual PSA prostate cancer risk evaluations (funded by NCI biomarkers discovery and validation grant), recruited through University of Texas Health Science Center at San Antonio and affiliated sites or through study advertisements, enrolment open to the community SCCS: Case-control in cohort, Southeastern USA. Prospective, Observational, Population-based. Source of cases: SCCS entry population. Source of controls: SCCS entry population SCPCS: Population-based, Retrospective, Observational. Source of cases: South Carolina Central Cancer Registry. Source of controls: Health Care Financing Administration beneficiary file SEARCH: Case-control - East Anglia, UK. Source of cases: Men < 70 years of age registered with prostate cancer at the population-based cancer registry, Eastern Cancer Registration and Information Centre, East Anglia, UK. Source of controls: Men attending general practice in East Anglia with no known prostate cancer diagnosis, frequency matched to cases by age and geographic region SNP_Prostate_Ghent: Hospital-based, Retrospective, Observational. Source of cases: Men treated with IMRT as primary or postoperative treatment for prostate cancer at the Ghent University Hospital between 2000 and 2010. Source of controls: Employees of the University hospital and members of social activity clubs, without a history of any cancer. SPAG: Hospital-based, Retrospective, Observational. Source of cases: Guernsey. Source of controls: Guernsey STHM2: Population-based, Retrospective, Observational. Source of cases: Cases were selected among men referred for PSA testing in laboratories in Stockholm County, Sweden, between 2010 and 2012. Source of controls: Controls were selected among men referred for PSA testing in laboratories in Stockholm County, Sweden, between 2010 and 2012. PCPT: Case-control from a randomized clinical trial. Source of cases: Randomized clinical trial. Source of controls: Randomized clinical trial SELECT: Case-cohort from a randomized clinical trial. Source of cases: Randomized clinical trial. Source of controls: Randomized clinical trial TAMPERE: Case-control - Finland, Retrospective, Observational, Population-based. Source of cases: Identified through linkage to the Finnish Cancer Registry and patient records; and the Finnish arm of the ERSPC study. Source of controls: Cohort participants without a diagnosis of cancer UGANDA: Uganda Prostate Cancer Study: Uganda is a case-control study of prostate cancer in Kampala Uganda that was initiated in 2011. Men with prostate cancer were enrolled from the Urology unit at Mulago Hospital and men without prostate cancer (i.e. controls) were enrolled from other clinics (i.e. surgery) at the hospital. UKGPCS: ICR, UK. Source of cases: Cases identified through clinics at the Royal Marsden hospital and nationwide NCRN hospitals. Source of controls: Ken Muir's control- 2000 ULM: Case-control - Germany. Source of cases: familial cases (n=162): identified through questionnaires for family history by collaborating urologists all over Germany; sporadic cases (n=308): prostatectomy series performed in the Clinic of Urology Ulm between 2012 and 2014. Source of controls: age-matched controls (n=188): age-matched men without prostate cancer and negative family history collected in hospitals of Ulm WUGS/WUPCS: Cases Series, USA. Source of cases: Identified through clinics at Washington University in St. Louis. Source of controls: Men diagnosed and managed with prostate cancer in University based clinic. Acknowledgement Statements: Aarhus: This study was supported by the Danish Strategic Research Council (now Innovation Fund Denmark) and the Danish Cancer Society. The Danish Cancer Biobank (DCB) is acknowledged for biological material. AHS: This work was supported by the Intramural Research Program of the NIH, National Cancer Institute, Division of Cancer Epidemiology and Genetics (Z01CP010119). ATBC: This research was supported in part by the Intramural Research Program of the NIH and the National Cancer Institute. Additionally, this research was supported by U.S. Public Health Service contracts N01-CN-45165, N01-RC-45035, N01-RC-37004, HHSN261201000006C, and HHSN261201500005C from the National Cancer Institute, Department of Health and Human Services. BioVu: The dataset(s) used for the analyses described were obtained from Vanderbilt University Medical Center's BioVU which is supported by institutional funding and by the National Center for Research Resources, Grant UL1 RR024975-01 (which is now at the National Center for Advancing Translational Sciences, Grant 2 UL1 TR000445-06). Canary PASS: PASS was supported by Canary Foundation and the National Cancer Institute's Early Detection Research Network (U01 CA086402) CCI: This work was awarded by Prostate Cancer Canada and is proudly funded by the Movember Foundation - Grant # D2013-36.The CCI group would like to thank David Murray, Razmik Mirzayans, and April Scott for their contribution to this work. CerePP French Prostate Cancer Case-Control Study (ProGene): None reported COH: SLN is partially supported by the Morris and Horowitz Families Endowed Professorship COSM: The Swedish Research Council, the Swedish Cancer Foundation CPCS1 & CPCS2: Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Herlev Ringvej 75, DK-2730 Herlev, DenmarkCPCS1 would like to thank the participants and staff of the Copenhagen General Population Study for their important contributions. CPDR: Uniformed Services University for the Health Sciences HU0001-10-2-0002 (PI: David G. McLeod, MD) CPS-II: The American Cancer Society funds the creation, maintenance, and updating of the Cancer Prevention Study II cohort. CPS-II thanks the participants and Study Management Group for their invaluable contributions to this research. We would also like to acknowledge the contribution to this study from central cancer registries supported through the Centers for Disease Control and Prevention National Program of Cancer Registries, and cancer registries supported by the National Cancer Institute Surveillance Epidemiology and End Results program. EPIC: The coordination of EPIC is financially supported by the European Commission (DG-SANCO) and the International Agency for Research on Cancer. The national cohorts are supported by the Danish Cancer Society (Denmark); the Deutsche Krebshilfe, Deutsches Krebsforschungszentrum and Federal Ministry of Education and Research (Germany); the Hellenic Health Foundation, Greek Ministry of Health; Greek Ministry of Education (Greece); the Italian Association for Research on Cancer (AIRC) and National Research Council (Italy); the Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF); the Statistics Netherlands (The Netherlands); the Health Research Fund (FIS), Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, Spanish Ministry of Health ISCIII RETIC (RD06/0020), Red de Centros RCESP, C03/09 (Spain); the Swedish Cancer Society, Swedish Scientific Council and Regional Government of Skåne and Västerbotten, Fundacion Federico SA (Sweden); the Cancer Research UK, Medical Research Council (United Kingdom). EPICAP: The EPICAP study was supported by grants from Ligue Nationale Contre le Cancer, Ligue départementale du Val de Marne; Fondation de France; Agence Nationale de sécurité sanitaire de l'alimentation, de l'environnement et du travail (ANSES). The EPICAP study group would like to thank all urologists, Antoinette Anger and Hasina Randrianasolo (study monitors), Anne-Laure Astolfi, Coline Bernard, Oriane Noyer, Marie-Hélène De Campo, Sandrine Margaroline, Louise N'Diaye, and Sabine Perrier-Bonnet (Clinical Research nurses). ERSPC: This study was supported by the DutchCancerSociety (KWF94-869,98-1657,2002-277,2006-3518, 2010-4800), The Netherlands Organisation for Health Research and Development (ZonMW-002822820, 22000106, 50-50110-98-311, 62300035), The Dutch Cancer Research Foundation (SWOP), and an unconditional grant from Beckman-Coulter-HybritechInc. ESTHER: The ESTHER study was supported by a grant from the Baden Württemberg Ministry of Science, Research and Arts. The ESTHER group would like to thank Hartwig Ziegler, Sonja Wolf, Volker Hermann, Heiko Müller, Karina Dieffenbach, Katja Butterbach for valuable contributions to the study. FHCRC: The FHCRC studies were supported by grants R01-CA056678, R01-CA082664, and R01-CA092579 from the US National Cancer Institute, National Institutes of Health, with additional support from the Fred Hutchinson Cancer Research Center. FHCRC would like to thank all the men who participated in these studies. Gene-PARE: The Gene-PARE study was supported by grants 1R01CA134444 from the U.S. National Institutes of Health, PC074201 and W81XWH-15-1-0680 from the Prostate Cancer Research Program of the Department of Defense and RSGT-05-200-01-CCE from the American Cancer Society. Hamburg-Zagreb: None reported HPFS: The Health Professionals Follow-up Study was supported by grants UM1CA167552, CA133891, CA141298, and P01CA055075. HPFS are grateful to the participants and staff of the Physicians' Health Study and Health Professionals Follow-Up Study for their valuable contributions, as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, and WY. IMPACT: The IMPACT study was funded by The Ronald and Rita McAulay Foundation, CR-UK Project grant (C5047/A1232), Cancer Australia, AICR Netherlands A10-0227, Cancer Australia and Cancer Council Tasmania, NIHR, EU Framework 6, Cancer Councils of Victoria and South Australia, and Philanthropic donation to Northshore University Health System. We acknowledge support from the National Institute for Health Research (NIHR) to the Biomedical Research Centre at The Institute of Cancer Research and Royal Marsden Foundation NHS Trust. IMPACT acknowledges the IMPACT study steering committee, collaborating centres, and participants. IPO-Porto: The IPO-Porto study was funded by Fundaçäo para a Ciência e a Tecnologia (FCT; UID/DTP/00776/2013 and PTDC/DTP-PIC/1308/2014) and by IPO-Porto Research Center (CI-IPOP-16-2012 and CI-IPOP-24-2015). MC and MPS are research fellows from Liga Portuguesa Contra o Cancro, Núcleo Regional do Norte. SM is a research fellow from FCT (SFRH/BD/71397/2010). IPO-Porto would like to express our gratitude to all patients and families who have participated in this study. Karuprostate: The Karuprostate study was supported by the the Frech National Health Directorate and by the Association pour la Recherche sur les Tumeurs de la ProstateKarusprostate thanks Séverine Ferdinand. KULEUVEN: F.C. and S.J. are holders of grants from FWO Vlaanderen (G.0684.12N and G.0830.13N), the Belgian federal government (National Cancer Plan KPC_29_023), and a Concerted Research Action of the KU Leuven (GOA/15/017). TVDB is holder of a doctoral fellowship of the FWO. LAAPC: This study was funded by grant R01CA84979 (to S.A. Ingles) from the National Cancer Institute, National Institutes of Health. Malaysia: The study was funded by the University Malaya High Impact Research Grant (HIR/MOHE/MED/35). Malaysia thanks all associates in the Urology Unit, University of Malaya, Cancer Research Initiatives Foundation (CARIF) and the Malaysian Men's Health Initiative (MMHI). MCCS: MCCS cohort recruitment was funded by VicHealth and Cancer Council Victoria. The MCCS was further supported by Australian NHMRC grants 209057, 251553, and 504711, and by infrastructure provided by Cancer Council Victoria. Cases and their vital status were ascertained through the Victorian Cancer Registry (VCR) and the Australian Institute of Health and Welfare (AIHW), including the National Death Index and the Australian Cancer Database. MCC-Spain: The study was partially funded by the Accion Transversal del Cancer, approved on the Spanish Ministry Council on the 11th October 2007, by the Instituto de Salud Carlos III-FEDER (PI08/1770, PI09/00773-Cantabria, PI11/01889-FEDER, PI12/00265, PI12/01270, and PI12/00715), by the Fundación Marqués de Valdecilla (API 10/09), by the Spanish Association Against Cancer (AECC) Scientific Foundation and by the Catalan Government DURSI grant 2009SGR1489. Samples: Biological samples were stored at the Parc de Salut MAR Biobank (MARBiobanc; Barcelona) which is supported by Instituto de Salud Carlos III FEDER (RD09/0076/00036). Also sample collection was supported by the Xarxa de Bancs de Tumors de Catalunya sponsored by Pla Director d'Oncologia de Catalunya (XBTC). MCC-Spain acknowledges the contribution from Esther Gracia-Lavedan in preparing the data. We thank all the subjects who participated in the study and all MCC-Spain collaborators. MD Anderson: Prostate Cancer Case-Control Studies at MD Anderson (MDA) supported by grants CA68578, ES007784, DAMD W81XWH-07-1-0645, and CA140388. MDACC_AS: None reported MEC: Funding provided by NIH grant U19CA148537 and grant U01CA164973. MIAMI (WFPCS): ACS MOFFITT: The Moffitt group was supported by the US National Cancer Institute (R01CA128813, PI: J.Y. Park). NMHS: Funding for the Nashville Men's Health Study (NMHS) was provided by the National Institutes of Health Grant numbers: RO1CA121060. PCaP only data: The North Carolina - Louisiana Prostate Cancer Project (PCaP) is carried out as a collaborative study supported by the Department of Defense contract DAMD 17-03-2-0052. For HCaP-NC follow-up data: The Health Care Access and Prostate Cancer Treatment in North Carolina (HCaP-NC) study is carried out as a collaborative study supported by the American Cancer Society award RSGT-08-008-01-CPHPS. For studies using both PCaP and HCaP-NC follow-up data please use: The North Carolina - Louisiana Prostate Cancer Project (PCaP) and the Health Care Access and Prostate Cancer Treatment in North Carolina (HCaP-NC) study are carried out as collaborative studies supported by the Department of Defense contract DAMD 17-03-2-0052 and the American Cancer Society award RSGT-08-008-01-CPHPS, respectively. For any PCaP data, please include: The authors thank the staff, advisory committees and research subjects participating in the PCaP study for their important contributions. For studies using PCaP DNA/genotyping data, please include: We would like to acknowledge the UNC BioSpecimen Facility and LSUHSC Pathology Lab for our DNA extractions, blood processing, storage and sample disbursement (https://genome.unc.edu/bsp). For studies using PCaP tissue, please include: We would like to acknowledge the RPCI Department of Urology Tissue Microarray and Immunoanalysis Core for our tissue processing, storage and sample disbursement. For studies using HCaP-NC follow-up data, please use: The Health Care Access and Prostate Cancer Treatment in North Carolina (HCaP-NC) study is carried out as a collaborative study supported by the American Cancer Society award RSGT-08-008-01-CPHPS. The authors thank the staff, advisory committees and research subjects participating in the HCaP-NC study for their important contributions. For studies that use both PCaP and HCaP-NC, please use: The authors thank the staff, advisory committees and research subjects participating in the PCaP and HCaP-NC studies for their important contributions. PCMUS: The PCMUS study was supported by the Bulgarian National Science Fund, Ministry of Education and Science (contract DOO-119/2009; DUNK01/2-2009; DFNI-B01/28/2012) with additional support from the Science Fund of Medical University - Sofia (contract 51/2009; 8I/2009; 28/2010). PHS: The Physicians' Health Study was supported by grants CA34944, CA40360, CA097193, HL26490, and HL34595. PHS members are grateful to the participants and staff of the Physicians' Health Study and Health Professionals Follow-Up Study for their valuable contributions, as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, and WY. PLCO: This PLCO study was supported by the Intramural Research Program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIHPLCO thanks Drs. Christine Berg and Philip Prorok, Division of Cancer Prevention at the National Cancer Institute, the screening center investigators and staff of the PLCO Cancer Screening Trial for their contributions to the PLCO Cancer Screening Trial. We thank Mr. Thomas Riley, Mr. Craig Williams, Mr. Matthew Moore, and Ms. Shannon Merkle at Information Management Services, Inc., for their management of the data and Ms. Barbara O'Brien and staff at Westat, Inc. for their contributions to the PLCO Cancer Screening Trial. We also thank the PLCO study participants for their contributions to making this study possible. Poland: None reported PROCAP: PROCAP was supported by the Swedish Cancer Foundation (08-708, 09-0677). PROCAP thanks and acknowledges all of the participants in the PROCAP study. We thank Carin Cavalli-Björkman and Ami Rönnberg Karlsson for their dedicated work in the collection of data. Michael Broms is acknowledged for his skilful work with the databases. KI Biobank is acknowledged for handling the samples and for DNA extraction. We acknowledge The NPCR steering group: Pär Stattin (chair), Anders Widmark, Stefan Karlsson, Magnus Törnblom, Jan Adolfsson, Anna Bill-Axelson, Ove Andrén, David Robinson, Bill Pettersson, Jonas Hugosson, Jan-Erik Damber, Ola Bratt, Göran Ahlgren, Lars Egevad, and Roy Ehrnström. PROGReSS: The PROGReSS study is founded by grants from the Spanish Ministry of Health (INT15/00070; INT16/00154; FIS PI10/00164, FIS PI13/02030; FIS PI16/00046); the Spanish Ministry of Economy and Competitiveness (PTA2014-10228-I), and Fondo Europeo de Desarrollo Regional (FEDER 2007-2013). ProMPT: Founded by CRUK, NIHR, MRC, Cambride Biomedical Research Centre ProtecT: Founded by NIHR. ProtecT and ProMPT would like to acknowledge the support of The University of Cambridge, Cancer Research UK. Cancer Research UK grants (C8197/A10123) and (C8197/A10865) supported the genotyping team. We would also like to acknowledge the support of the National Institute for Health Research which funds the Cambridge Bio-medical Research Centre, Cambridge, UK. We would also like to acknowledge the support of the National Cancer Research Prostate Cancer: Mechanisms of Progression and Treatment (PROMPT) collaborative (grant code G0500966/75466) which has funded tissue and urine collections in Cambridge. We are grateful to staff at the Welcome Trust Clinical Research Facility, Addenbrooke's Clinical Research Centre, Cambridge, UK for their help in conducting the ProtecT study. We also acknowledge the support of the NIHR Cambridge Biomedical Research Centre, the DOH HTA (ProtecT grant), and the NCRI/MRC (ProMPT grant) for help with the bio-repository. The UK Department of Health funded the ProtecT study through the NIHR Health Technology Assessment Programme (projects 96/20/06, 96/20/99). The ProtecT trial and its linked ProMPT and CAP (Comparison Arm for ProtecT) studies are supported by Department of Health, England; Cancer Research UK grant number C522/A8649, Medical Research Council of England grant number G0500966, ID 75466, and The NCRI, UK. The epidemiological data for ProtecT were generated though funding from the Southwest National Health Service Research and Development. DNA extraction in ProtecT was supported by USA Dept of Defense award W81XWH-04-1-0280, Yorkshire Cancer Research and Cancer Research UK. The authors would like to acknowledge the contribution of all members of the ProtecT study research group. The views and opinions expressed therein are those of the authors and do not necessarily reflect those of the Department of Health of England. The bio-repository from ProtecT is supported by the NCRI (ProMPT) Prostate Cancer Collaborative and the Cambridge BMRC grant from NIHR. We thank the National Institute for Health Research, Hutchison Whampoa Limited, the Human Research Tissue Bank (Addenbrooke's Hospital), and Cancer Research UK. PROtEuS: PROtEuS was supported financially through grants from the Canadian Cancer Society (13149, 19500, 19864, 19865) and the Cancer Research Society, in partnership with the Ministère de l'enseignement supérieur, de la recherche, de la science et de la technologie du Québec, and the Fonds de la recherche du Québec - Santé.PROtEuS would like to thank its collaborators and research personnel, and the urologists involved in subjects recruitment. We also wish to acknowledge the special contribution made by Ann Hsing and Anand Chokkalingam to the conception of the genetic component of PROtEuS. QLD: The QLD research is supported by The National Health and Medical Research Council (NHMRC) Australia Project Grants (390130, 1009458) and NHMRC Career Development Fellowship and Cancer Australia PdCCRS funding to J Batra. The QLD team would like to acknowledge and sincerely thank the urologists, pathologists, data managers and patient participants who have generously and altruistically supported the QLD cohort. RAPPER: RAPPER is funded by Cancer Research UK (C1094/A11728; C1094/A18504) and Experimental Cancer Medicine Centre funding (C1467/A7286). The RAPPER group thank Rebecca Elliott for project management. SABOR: The SABOR research is supported by NIH/NCI Early Detection Research Network, grant U01 CA0866402-12. Also supported by the Cancer Center Support Grant to the Cancer Therapy and Research Center from the National Cancer Institute (US) P30 CA054174. SCCS: SCCS is funded by NIH grant R01 CA092447, and SCCS sample preparation was conducted at the Epidemiology Biospecimen Core Lab that is supported in part by the Vanderbilt-Ingram Cancer Center (P30 CA68485). Data on SCCS cancer cases used in this publication were provided by the Alabama Statewide Cancer Registry; Kentucky Cancer Registry, Lexington, KY; Tennessee Department of Health, Office of Cancer Surveillance; Florida Cancer Data System; North Carolina Central Cancer Registry, North Carolina Division of Public Health; Georgia Comprehensive Cancer Registry; Louisiana Tumor Registry; Mississippi Cancer Registry; South Carolina Central Cancer Registry; Virginia Department of Health, Virginia Cancer Registry; Arkansas Department of Health, Cancer Registry, 4815 W. Markham, Little Rock, AR 72205. The Arkansas Central Cancer Registry is fully funded by a grant from National Program of Cancer Registries, Centers for Disease Control and Prevention (CDC). Data on SCCS cancer cases from Mississippi were collected by the Mississippi Cancer Registry which participates in the National Program of Cancer Registries (NPCR) of the Centers for Disease Control and Prevention (CDC). The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the CDC or the Mississippi Cancer Registry. SCPCS: SCPCS is funded by CDC grant S1135-19/19, and SCPCS sample preparation was conducted at the Epidemiology Biospecimen Core Lab that is supported in part by the Vanderbilt-Ingram Cancer Center (P30 CA68485). SEARCH: SEARCH is funded by a program grant from Cancer Research UK (C490/A10124) and supported by the UK National Institute for Health Research Biomedical Research Centre at the University of Cambridge. SNP_Prostate_Ghent: The study was supported by the National Cancer Plan, financed by the Federal Office of Health and Social Affairs, Belgium. SPAG: Wessex Medical ResearchHope for Guernsey, MUG, HSSD, MSG, Roger Allsopp STHM2: STHM2 was supported by grants from The Strategic Research Programme on Cancer (StratCan), Karolinska Institutet; the Linné Centre for Breast and Prostate Cancer (CRISP, number 70867901), Karolinska Institutet; The Swedish Research Council (number K2010-70X-20430-04-3) and The Swedish Cancer Society (numbers 11-0287 and 11-0624); Stiftelsen Johanna Hagstrand och Sigfrid Linnérs minne; Swedish Council for Working Life and Social Research (FAS), number 2012-0073STHM2 acknowledges the Karolinska University Laboratory, Aleris Medilab, Unilabs and the Regional Prostate Cancer Registry for performing analyses and help to retrieve data. Carin Cavalli-Björkman and Britt-Marie Hune for their enthusiastic work as research nurses. Astrid Björklund for skilful data management. We wish to thank the BBMRI.se biobank facility at Karolinska Institutet for biobank services. PCPT & SELECT are funded by Public Health Service grants U10CA37429 and 5UM1CA182883 from the National Cancer Institute. SWOG and SELECT thank the site investigators and staff and, most importantly, the participants who donated their time to this trial. TAMPERE: The Tampere (Finland) study was supported by the Academy of Finland (251074), The Finnish Cancer Organisations, Sigrid Juselius Foundation, and the Competitive Research Funding of the Tampere University Hospital (X51003). The PSA screening samples were collected by the Finnish part of ERSPC (European Study of Screening for Prostate Cancer). TAMPERE would like to thank Riina Liikanen, Liisa Maeaettaenen and Kirsi Talala for their work on samples and databases. UGANDA: None reported UKGPCS: UKGPCS would also like to thank the following for funding support: The Institute of Cancer Research and The Everyman Campaign, The Prostate Cancer Research Foundation, Prostate Research Campaign UK (now Prostate Action), The Orchid Cancer Appeal, The National Cancer Research Network UK, The National Cancer Research Institute (NCRI) UK. We are grateful for support of NIHR funding to the NIHR Biomedical Research Centre at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust. UKGPCS should also like to acknowledge the NCRN nurses, data managers, and consultants for their work in the UKGPCS study. UKGPCS would like to thank all urologists and other persons involved in the planning, coordination, and data collection of the study. ULM: The Ulm group received funds from the German Cancer Aid (Deutsche Krebshilfe). WUGS/WUPCS: WUGS would like to thank the following for funding support: The Anthony DeNovi Fund, the Donald C. McGraw Foundation, and the St. Louis Men's Group Against Cancer.
Study
phs001391
Bogalusa Heart Study (BHS-BioLINCC)
Objectives: To investigate the early natural history of cardiovascular disease in a cohort of children and young adults in a biracial, semirural community.Background: The Bogalusa Heart Study has been a long-term epidemiologic study. The investigators have identified and followed black and white participants for nearly 40 years, and have described the incidence and prevalence of biologic and behavioral cardiovascular disease risk factors from childhood through adulthood. Their participation has enabled the study to not only document differences between males and females, but also between blacks and whites. The results from the Bogalusa Heart Study have clearly documented that the genesis of atherosclerosis has its basis in childhood, and that prevention can and must begin at the early ages.The Bogalusa Heart Study had been funded over the years by the Specialized Centers of Research (SCOR) Program. The SCOR program was initiated by NHLBI in 1970 to expedite the development and application of new knowledge essential for improved diagnosis, treatment, and prevention of arteriosclerosis, hypertension, pulmonary disease, and thrombosis. In 1984 a Demonstration and Education Component was added to the parent SCOR of the Bogalusa Heart Study in order to translate the experience gained in epidemiological studies into an intervention study designed to retard the development of cardiovascular risk factors in children. Beginning in 1997, the study was supported by the cooperative agreement mechanism. Participants: The Bogalusa dataset includes 11,796 participants that attended at least one of seven cross-sectional pediatric exams and/or the 1995-96 adult examination. Subjects ranged in age from 3 to 20 years at the pediatric exams and 20-37 at the time of the adult exam. Approximately 6,000 have more than one examination constituting a dynamic cohort. Design: The initial survey in 1973-1974 was restricted to children ages 2 1/2 to 14. A physical examination was conducted and information was collected on anthropometric data, hemoglobin, blood pressure, serum lipids, and health history. Over 3,500 children participated. The second cross-sectional survey of 1976-1977 and subsequent surveys expanded the eligible population to include all children ages 5-17 years. The second survey of over 4,000 children also included information on salt intake, smoking, health beliefs, and attitudes, and for girls ages 8-17, menstrual history and oral contraceptive use. The third survey of over 3,500 participants in 1978-1979 also collected anthropometric measurements on skinfold thickness and two measurements of heart rate. The fourth survey of over 3,300 participants in 1981-1982 added data on alcohol use, Type A behavior, peer networks and dieting habits.The Bogalusa Heart Study continued to use a cross-sectional and longitudinal design with the general cross-sectional survey of approximately 3,700 Bogalusa children ages five to seventeen in 1988-1989 in the sixth screen and additional longitudinal studies to recall children in defined subgroups for more intensive evaluation. Half of the 12,000 participants screened since 1973 had been studied three or more times. The Post High School Study examined young adults ages 21 through 30 who previously were examined as children ages five through fourteen in the first Bogalusa Heart Study screening in 1973-1974. The population included approximately 4,603 young adults originally screened and any other children or adolescents examined for the first time in any subsequent surveys. The cardiovascular phenotypes include obesity, blood pressure, lipids, lipoproteins, apoproteins, homocysteine, glucose-insulin, fibrinogen, plasminogen activator inhibitor-1 and von Willebrand Factor. Environmental risk factors consist of sociodemographic characteristics, tobacco and alcohol use, oral contraception, physical activity, cognitive and physical function, and quality of sleep and diet. Subclinical morbidity includes echo-Doppler measurements of cardiac-carotid structure and function.
Study
phs004173
Newborn Screening Translational Research Network (NBSTRN): Newborn Sequencing in Genomic Medicine and Public Health (NSIGHT2)
Large-scale studies of whole genome sequencing (WGS) in neonates have largely been neglected. Thus, the clinical and social implications of this absence are largely unknown. The proposed study aims to increase scientific knowledge for acutely ill neonates. This population may stand to benefit largely from WGS given the severity of illness. The proposed clinical study is a prospective, randomized, partially-blinded study that has both quantitative and qualitative assessments of the risks and benefits of the use of STATseq, a 2-day genome test, in acutely ill neonates at Children's Mercy Hospital. Study participants include neonates, their parents, and clinicians caring for them. The overall aims will provide a nuanced understanding of the role of rapid diagnostic WGS for acutely ill neonates, including relative diagnostic sensitivity and change in management both for broad NICU populations and subpopulations, accuracy, reproducibility, and relative value of proband WGS and familial triads. The technological aims include improving diagnostic sensitivity without sacrificing the rapid turnaround time. The clinical aims include looking at increased diagnostic yield, shorter time to diagnosis and clinician perception of management changes that result from implementation of STAT-seq. Finally, the study will investigate the perceived value of STAT-seq in parents and clinicians using structural equation modeling. The studies also are designed for long-term continuation of assessment of the enrollees (beyond the current funding period) and in additional sample types (via a biorepository). Long-term continuation studies will test effects on long-term morbidity, mortality, quality of life and cost of care. We hope that the study results will be sufficient to provide an evidence base for physician adoption and provider reimbursement of WGS in Level 3 and 4 NICUs.
Study
phs002094
Follow-up of Ovarian Cancer Genetic Association and Interaction Studies (FOCI)
(Excerpted/paraphrased from original grant application): FOCI seeks to expand our understanding of epithelial ovarian cancer through a coordinated and comprehensive approach. Project 1 will focus on discovery, expansion, and replication. By pooling GWAS, we expect to identify new associations and achieve independent replication, explore whether there are risk variants specific for histologic subtypes, and evaluate structural polymorphisms - copy number variants - as risk factors. Finally, Project 1 will leverage the GWAS data to correlate DNA variants with a new endpoint - survival. Project 2 will focus on biological studies designed to help inform interpretation of findings from Project 1. This will include efforts to identify the functional consequences of variants and improve understanding of biological mechanisms. Project 3 will include epidemiologic studies of gene by gene interaction, gene by environment interaction, and development of risk prediction models. The collective effort builds upon the strengths and history of collaboration inherent in the Ovarian Cancer Association Consortium (OCAC), a multidisciplinary group comprised of epidemiologists, genetic epidemiologists, statistical geneticists, molecular and cell biologists and clinicians that was formed in 2005. The FOCI Cohort is utilized in the following dbGaP sub-studies. To view genotypes, other molecular data, and derived variables collected in these sub-studies, please click on the following sub-studies below or in the "Sub-studies" box located on the right hand side of this top-level study page phs001133 FOCI Cohort. phs001131 Affymetrix Exome Chip phs001132 GWAS Meta Analysis phs001142 Mayo Omni Express phs001150 Mayo 2 5M
Study
phs001133
Molecular Subtype-specific Biomarkers Improves Colorectal Cancer Prognostication
Colorectal cancer (CRC) is characterized by major inter-tumor diversity that complicates the prediction of disease and treatment outcomes. Recent efforts help resolve this by sub-classification of CRC according to natural molecular subtypes, however, this strategy is not yet able to provide clinicians with improved tools for decision-making. We here present an extended framework for CRC stratification that specifically aims to improve patient prognostication. Using transcriptional profiles from 1,100 CRCs, including a novel set of >300 samples, we identify novel cancer cell and tumor archetypes and suggest the tumor microenvironment as a major prognostic determinant that can be influenced by the microbiome. Notably, our subtyping strategy allowed identification of novel archetype-specific prognostic biomarkers that provided information beyond and independent of UICC-TNM staging, MSI-status and consensus molecular subtyping. The results illustrate that our extended subtyping framework, combining subtyping and subtype-specific biomarkers, could contribute to improved patient prognostication and may form a strong basis for future studies.
Study
EGAS00001002376
Germline Genomic Analyses of Breast Cancer in Latinas
In this case-control study of breast cancer among Hispanic/Latina (H/L) women, the goals were to identify breast cancer susceptibility genes and genetic variants for risk of developing breast cancer in H/L women. There were two phases with whole exome sequencing (WES) done in a discovery phase and targeted sequencing of candidate genes in the replication phase. The candidate genes selected for replication came from the first phase and from genes known to affect breast cancer risk in prior studies. Cases and controls were drawn from several studies. Breast cancer cases for discovery phase included women from the City of Hope (COH) Clinical Cancer Genomics Community Research Network (CCGCRN), UCSF Cancer Genetics Clinic and USC Cancer Genetics clinic. Controls for discovery phase included women who did not have breast and were recruited through City of Hope community fairs. A separate group of controls included women from the California Teachers Study. Additional controls for discovery were obtained from WES already generated by the Multi-ethnic cohort (MEC) study. The discovery/WES results of known breast cancer genes were published (PMID 31206626). In the replication phase, we included cases and controls from the Cancer de Mama (CAMA) study which is a case-control study of breast cancer in Mexico, MEC for those not included in the discovery dataset, the San Francisco Bay Area Cancer study (phs000912), the Northern California Breast Cancer Registry, and the California Pacific Medical Center Research Institute Women's Cohort (phs000395). Cases from the PATHWAYS study who were recruited from Northern California Kaiser Permanente were also included. Results from the combined discovery and replication analyses are reported in PMID 36747679.
Study
phs003144
Pediatric Whole Genome Sequencing Diagnostic Utility
The standard of care for first-tier clinical investigation of the etiology of congenital malformations and neurodevelopmental disorders is chromosome microarray analysis (CMA) for copy number variations (CNVs), often followed by gene(s)-specific sequencing searching for smaller insertion-deletions (indels) and single nucleotide variant (SNV) mutations. We compared diagnostic rate of WGS to CMA and targeted gene testing of 100 patients referred to The Hospital for Sick Children Genetics clinic in 2014. WGS identified genetic variants meeting clinical diagnostic criteria in 34% of cases, representing a 4-fold increase in diagnostic rate over CMA (8%) (p-value = 1.42e-05) alone and >2-fold increase in CMA plus targeted gene sequencing (13%) (p-value = 0.0009). WGS identified all rare clinically significant CNVs that were detected by CMA. In an additional 26 patients, WGS revealed indel and missense mutations presenting in a dominant (63%) or a recessive (37%) manner. We found four subjects with mutations in at least two genes associated with distinct genetic disorders, including two cases harboring a pathogenic CNV and SNV. When considering medically actionable secondary findings in addition to primary WGS findings, 38% of patients would benefit from genetic counselling.
Study
EGAS00001001623
Exploiting evolutionary steering in cancer therapy
Drug resistance, mediated by intra-tumour heterogeneity and clonal evolution, is arguably the biggest problem in cancer therapy today. However, evolving resistance to one drug may come at the cost of increased sensitivity to another due to so-called evolutionary trade-offs. This weakness can be exploited in the clinic using an approach called ‘evolutionary herding’ that aims at controlling the tumour cell population to delay or prevent resistance. However, model systems able to recapitulate cancer evolution experimentally are lacking and current in vitro techniques based on small populations, re-plating and escalating dose, are unsuitable to develop evolutionary herding strategies. We present a novel methodology for evolutionary herding in vitro and ex vivo using patient-derived organoids. Our approach is based on a combination of single-cell barcoding, very large populations of 108-109 cells grown without re-plating, realistic high drug doses, time-course monitoring of cancer clones, and mathematical modelling of tumour evolutionary dynamics. We demonstrate evolutionary herding in non-small cell lung cancer in vitro and in patient-derived colorectal cancer organoids (PDO). We show that herding causes controlled evolutionary bottlenecks that lead to collateral sensitivity. Through genomic analysis, we were also able to determine the mechanisms that drive such sensitivity. Our approach allows modelling evolutionary trade-offs experimentally to test patient-specific evolutionary herding strategies that can be translated into the clinic to control treatment resistance.
Study
EGAS00001003200
RNA-Seq of Whole Blood from Patients with Intracranial Aneurysms
Rupture of intracranial aneurysms (IAs) causes intracranial hemorrhaging that leads to high rates of neurological deficits and death. Although rupture rates are low, clinicians must decide whether to treat or monitor these potentially dangerous lesions. In the current clinical practice, the most common metric to measure risk of rupture is aneurysm size. However, clinical data show that small aneurysms can also rupture. As a result, alternative clinical stratification scores have been proposed, including the PHASES (Population, Hypertension, Age, Size, Earlier subarachnoid hemorrhage, and Site) score based on patient demographics and IA characteristics to stratify ruptured and unruptured IAs, and the Rupture Resemblance Score (RRS) that stratifies ruptured and unruptured IAs based on hemodynamic and morphological properties. However, all metrics require imaging on digital subtraction angiography (DSA), which is invasive, expensive, requires the use of X-rays, and is associated with transient or permanent neurological and non-neurological complications. We hypothesize that individuals with dangerous IAs have detectable gene expression differences in their blood that could be used as biomarkers to determine rupture risk. We propose to use whole blood transcriptomes to develop a "one-stop" diagnostic test that can detect the presence of IAs and determine the risk of rupture based on circulating RNA expression biomarkers using our prototype AneuScreenTM platform. We aim to develop and validate biomarkers to predict risk, as calculated by the currently-used metric of aneurysm size, the clinical PHASES score, and the RRS. Here, we collected blood samples from consented individuals who were receiving cerebral imaging at the Gates Vascular Institute (Buffalo, NY) for intracranial aneurysm. RNA extracted from blood samples (n=43) was subjected to RNA-sequencing and added to our existing database (n=44). When combined with our previous data, we had transcriptome data from 68 aneurysms (after additional sample filtering). This dataset was stratified into low- and high-risk according to IA size, PHASES score, and RRS score. Differentially expressed genes were identified and used to construct predictive models. Fastq files and basic demographic information for the 43 newly-sequenced intracranial aneurysm samples on this project will be available through dbGaP.
Study
phs003072
Benchmark Dataset for Somatic Mutation Calling in Cell-Free DNA
This study provides a comprehensive benchmarking resource for somatic variant detection in cell-free DNA (cfDNA) from cancer patients. Longitudinal plasma samples from colorectal and breast cancer cohorts were selected to create patient-matched dilution series spanning ultra-low to high circulating-tumour-DNA (ctDNA) fractions, while preserving each individual’s germline and clonal haematopoiesis background. Deep whole-genome sequencing (150×) and ultra-deep whole-exome sequencing (2,000×) generated a reference call set of ~37,000 single-nucleotide variants and ~58,000 insertions/deletions. These data enabled systematic evaluation of nine somatic variant callers across variable ctDNA levels and sequencing depths, and were further used to explore machine-learning–guided parameter tuning. The resulting dataset offers an openly accessible framework for developers and clinicians to assess and optimize somatic variant calling in liquid biopsy applications.
Dataset
EGAD50000001870
Sleep Heart Health Study (SHHS-BioLINCC)
Data Access NOTE: Please refer to the “Authorized Access” section below for information about how access to the data from this accession differs from many other dbGaP accessions.Related StudiesParent cohort phenotype data can be accessed through ARIC-BioLINCC, Framingham-BioLINCC, and CHS-BioLINCC. Objectives To determine the cardiovascular and other consequences of sleep-disordered breathing and to test whether sleep-disordered breathing is associated with an increased risk of coronary heart disease, stroke, all-cause mortality and hypertension by examining subjects from well-characterized and established epidemiologic cohorts. Background Obstructive sleep apnea syndrome (OSA) is a potentially debilitating condition characterized by repetitive episodes of apnea while asleep, nocturnal oxygen desaturation, excessive daytime sleepiness, and loud disruptive snoring. Epidemiologic data from middle-aged adults indicate that OSA is common, with prevalence rates of 4% in men and 2% in women. Prior studies implicated OSA as a risk factor for the development of hypertension, ischemic heart disease, congestive heart failure, stroke and consequently premature death. Questions arose as to whether an increased propensity for cardiovascular and cerebrovascular diseases was limited to only those with frank OSA or whether more subtle forms of sleep-disordered breathing (SDB) would also confer elevated risk. Further evidence was also needed to clarify whether, SDB, including OSA, is an independent risk factor for the development of cardiovascular or cerebrovascular disease. Known cardiovascular and cerebrovascular disease risk factors such as obesity and smoking are commonly present in those with SDB; therefore, apparent associations between SDB and cardiovascular and cerebrovascular diseases may have resulted from the effects of these concomitant risk factors. Moreover, there was no understanding as to whether such factors as race, age, gender, and prevalent cardiovascular or cerebrovascular disease might interact with SDB to alter future cardiovascular and cerebrovascular disease risk. Mechanisms underlying any propensity to develop cardiovascular or cerebrovascular disease with SDB had not been firmly established (Quan, et al., 1997, PMID: 9493915). Participants Participants in SHHS were recruited from nine existing NHLBI epidemiological studies in which data on cardiovascular risk factors had been collected previously. The “parent” cohorts included: Two sites of the Atherosclerosis Risk in Communities Study (ARIC) Three sites of the Cardiovascular Health Study (CHS) The Framingham Offspring Cohort The Strong Heart Study (SHS) sites in South Dakota, Oklahoma, and Arizona The New York Hypertension Cohorts The Tucson Epidemiologic Study of Airways Obstructive Diseases and the Health and Environment Study From these parent cohorts, a sample of participants who met the inclusion criteria (age 40 years or older; no history of treatment of sleep apnea; no tracheostomy; no current home oxygen therapy) was invited to participate in the baseline examination of the SHHS, which included an initial polysomnogram (SHHS-1). Several cohorts over-sampled snorers in order to increase the study-wide prevalence of sleep-disordered breathing. In all, 6441 individuals were enrolled between November 1, 1995 and January 31, 1998. During exam cycle 3 (January 2001-June 2003), a second polysomnogram (SHHS-2) was obtained in 3295 of the participants. Due to sovereignty issues, Strong Heart Study participants are not included in the shared SHHS data. Data from a total of 5839 participants (1920 ARIC, 1249 CHS, 997 Framingham Offspring and OMNI 1, and 1673 from other studies), consenting to share data are available. Design The Sleep Heart Health Study added in-home polysomnography to the data collected in each of the parent studies at a baseline SHHS exam and a follow-up approximately 4 years later. Using the Compumedics PS polysomnograph, sleep studies were obtained in an unattended setting, usually in the homes of the participants, by trained and certified technicians. The recording montage consisted of: C3/A2 and C4/A1 EEGs, sampled at 125 Hz right and left electrooculograms (EOGs), sampled at 50 Hz a bipolar submental electromyogram (EMG), sampled at 125 Hz thoracic and abdominal excursions (THOR and ABDO), recorded by inductive plethysmography bands and sampled at 10 Hz "airflow" detected by a nasal-oral thermocouple (Protec, Woodinville, WA), sampled at 10 Hz finger-tip pulse oximetry (Nonin, Minneapolis, MN) sampled at 1 Hz ECG from a bipolar lead, sampled at 125 Hz for most SHHS-1 studies and 250 Hz for SHHS-2 studies Heart rate (PR) derived from the ECG and sampled at 1 Hz body position (using a mercury gauge sensor) ambient light (on/off, by a light sensor secured to the recording garment)This montage provides data on the occurrence of sleep-disordered breathing, sleep stages, heart rate, oximetry and on arousals. Each participant in the parent studies was also asked to complete the Sleep Habits Questionnaire which covers usual sleep pattern, snoring, and sleepiness.
Study
phs003637
eMERGE Network's Multi-Center Pilot of Pharmacogenetic Sequencing in Clinical Practice
eMERGE-PGx is a multi-site test of the concept that sequence information can be coupled to electronic medical records (EMRs) for use in healthcare. The promise of personalized medicine - health care guided by each individual's biological characteristics - is being fostered by increasingly powerful and economical methods to acquire clinically relevant biomarkers from large numbers of people. One therapeutic area that seems especially ripe for an early test of the personalized medicine concept is pharmacogenomics (PGx) - the idea that individual variation in drug response includes a genomic component. Drug response variation is an accepted feature of virtually all drug treatments, and contemporary molecular biologic tools continue to identify key genes mediating drug metabolism, transport, and targets. Importantly, common variation in these genes is an increasingly well-recognized contributor, sometimes with large effects, to variation in drug responses. As a result, recommendations for genotype-guided therapy are increasing. These evidence-based recommendations, if implemented in health care practice, could reduce adverse drug events and improve time to therapeutic response. Through eMERGE-PGx, we are developing strategies for the optimal implementation of genetic sequence data into the clinical environment with the ultimate goal of improving patient care. Site and participants include: Children's Hospital of Pennsylvania (CHOP): The Center for Applied Genomics (CAG) at the Children's Hospital of Philadelphia (CHOP) is a high-throughput, highly automated genotyping and sequencing facility equipped with state-of-the-art genotyping and sequencing platforms. Children who are treated at the Children's Hospital Healthcare Network and their parents may be eligible to take part in a major initiative to collect more than 100,000 blood samples, covering a wide range of pediatric diseases. The PGx population selected for sequencing with the PGRNseq panel at CHOP is 1,650 children from CAG's biorepository with well-documented drug-related severe adverse events (SAEs) or EHR-based drug response profiles. SAEs were extracted from EPIC records and from CHOP's Adverse Event (AE) database, which documents every AE at CHOP. These AEs are classified by a medical review panel according to the causal relationship with the suspected drug into 'doubtful', 'possible', and 'probable'. Individuals with events classified as probable, severe and objective, were selected for sequencing. The drugs more frequently associated with adverse events are antibiotics, antineoplastics, immunosuppressants and psychotropic drugs. This cohort constitutes 50% of the target population. The remaining subjects were selected using EHR-based algorithms that we have developed and validated at CAG for identifying patients not responding to ADHD medication (primarily atomoxetine) and patients refractory to antiepileptic treatment from responders. Cincinnati Children's Hospital Medical Center/Boston's Children's Hospital (CCHMC/BCH): 811 CCHMC samples were obtained from children, adolescents or young adults exposed to medication or at risk for needing medication of study interest. 55% of participants were exposed to one or more opioids and their DNA source was a CCHMC study-specific biobank; while 27% of participants were at risk for needing an opioid for surgical pain management and were newly recruited. The remainder of the cohort was exposed to methylphenidate and their DNA samples were obtained from a CCHMC study-specific biobank. The focus of Boston Children's Hospital eMERGE PGx project is on individuals with epilepsy. Samples were taken from a current pharmacogenomics study already in place through which DMET analysis was run and used as confirmation for PGRN-Seq results. A total of 109 samples were sent for PGRN-Seq analysis at University of Washington. The remaining 141 epilepsy samples were from Children's Hospital of Philadelphia and underwent testing with PGRN-Seq at CHOP. Geisinger Health System: A research cohort of adult Geisinger Clinic patients was enrolled from community-based primary care clinics of the Geisinger Health System. Patients were eligible for enrollment if they were a primary care patient of a Geisinger Clinic physician and were scheduled for a non-emergent clinic visit. All data are from Geisinger patients who consent to participate in the MyCode project. MyCode participants agree to provide biological samples for broad research use, including genomic analysis, and for linking of sample data to information in the participant's Geisinger health record. The consent also permits sharing of de-identified data for research purposes. Group Health(GH)/University of Washington (UW): Potential GH participants for the PGx project were enrolled in the eMERGE Network through the Northwest Institute of Genetic Medicine (NWIGM) biorepository, and provided the appropriate consent to receive clinically relevant genetic results (N~6300). Participants were eligible if aged 50 - 65 years old at the time of their enrollment into the NWIGM repository, living, enrolled in GH's integrated group practice, and had completed an online Health Risk Appraisal. The selection algorithm was based on several data sources from the EHR at Group Health: 1. Demographics - participants with self-reported race as Asian or African ancestry were prioritized and selected to enrich for non-European ancestry; 2. Diagnosis and procedure codes - participants were selected if found to have a history of hypertension, atrial fibrillation (AF,) or congestive heart failure (CHF). Participants with a history of arrhythmia were added if the entire selection algorithm did not generate 900 individuals. We also enriched for participants with EHR evidence of actionable indications related to PGRNSeq genes. Participants were selected if found to have an ICD9 code for malignant hyperthermia, hypertension, atrial fibrillation, congestive heart failure or long QT syndrome (LQTS); 3. Laboratory values - if a participant had any laboratory event of creatine kinase (CK) > 1000, and were dispensed statins within 6 months of the event, then they were selected; and 4. Medications - participants were excluded if ever on carbamazepine or had a current regimen of warfarin. Essentia Institute of Rural Health, Marshfield Clinic, Pennsylvania State University (Marshfield): For this study, 750 subjects were selected and enrolled into PGx based on Vanderbilt's algorithm designed to enrich for patients who are most likely to receive one of three common drugs (Clopidogrel, Warfarin or Simvastatin) in the next 2-3 years. These patients were sent a letter of invitation and description of the PGx project. Follow-up phone calls were made, and interested subjects came in for a one time meeting to discuss the project and go through the informed consent with the research coordinator. If they were interested they signed the consent and HIPAA forms and gave blood. Subjects were chosen and enrolled into PGx independently of previous biobank participation. Mayo Clinic: The Right Drug, Right Dose, Right Time - Using Genomic Data to Individualize Treatment (The RIGHT Protocol) enrolled 1013 patients to test the hypothesis that prescribers could deliver genome-guided drug therapy at the point-of-care by using pharmacogenomic data preemptively integrated in the electronic medical record. Complete details regarding the study population have been previously described (Bielinski et al., 2014). Icahn School of Medicine at Mount Sinai School (Mt Sinai): Our study site is the Primary Care Associates (PCA) practice group of the Mount Sinai Faculty Practice Associates (FPA) of the Mount Sinai Medical Center in New York City. This practice has 12 physician providers. All patient encounters are documented and managed with EpicCare ambulatory electronic medical record. Active PCA Patients eligible for enrollment fulfilled the following criteria: a) age 50 or older receiving clinical care at Mount Sinai FPA PCA practice with at least one practice encounter within 18 months prior to commencement of enrollment; b) no history or current use of clopidogrel, warfarin, or simvastatin. Eligible patients were invited to participate through de novo recruitment by letter sent by their provider. Interested patients were screened for eligibility and enrolled to participate in the eMERGE PGX study on site by a dedicated research coordinator. In addition to de novo enrollment from clinical practice, patients of FPA PCA who had previously enrolled in Mount Sinai's BioMe Biobank program AND fulfilled eligibility criteria as stated under a) and b) were identified by chart review and samples sequenced at CIDR using PGRNseq platform (N=300). PGRNseq data from 291 samples passed stringent quality control and are included in the current data set. Furthermore, 56 of these patients carrying known and validated 'actionable' variants affecting prescribing of clopidogrel, warfarin, and/or simvastatin were enrolled in the eMERGE PGX study following invitation through recontacting by the Principal Investigator of the BioMe Program. Northwestern University: Participants for this study were recruited from the General Internal Medicine (GIM) clinic at Northwestern Medical Group (NMG). Patients were selected for invitation to participate if they had been seen a minimum of two times over the last four years, having a high likelihood to receive a prescription for warfarin, Plavix, or a statin, and are seeing a physician who has agreed to allow their patients to be contacted for the study. We utilized an algorithm developed at Vanderbilt and tailored to our population which uses our EHR to estimate the probability that individuals will receive a prescription for warfarin, Plavix, or a statin in the next three years. Participants were sent a letter explaining the study prior to their GIM appointment and offered participation at the time of their visit. Participants were consented on-site and blood drawn after consent was obtained. The GIM clinic consists of 39 primary care physicians who provide approximately 80,000 patient encounters per year. As with any large primary care clinic, a significant proportion of patients in GIM clinic suffer from a variety of chronic health conditions, such as diabetes, hypertension, and coronary artery disease. Over 50,000 individuals have been seen by GIM doctors in the past 5 years; 11,562 of these patients have evidence of a statin prescription in the EHR, 3,436 have evidence of a warfarin prescription, and 1,872 have evidence of a Plavix prescription. Vanderbilt University: The more than 1000 participants enrolled into Vanderbilt's eMERGE PGx study were newly recruited from the Cardiology and Internal Medicine Clinics and the Hillsboro Medical Group within Vanderbilt University Medical Center (VUMC). Patients were selected based on a predictive algorithm estimating the patient's likelihood of receiving Clopidogrel, Warfarin, and/or Simvastatin. The algorithm identifies primarily older middle-aged patients, and the mean age of the study group is 74. The cohort is approximately 45% female with 75% of subjects self-identified as EA and 24% as AA. Subjects were consented in person by study personnel following a routine clinic visit and an introduction to the study staff by their doctor. VUMC is a comprehensive health care facility dedicated to patient care, research, and the education of health care professionals. Translational research into the causes and treatment of disease as well as studying fundamental biological properties is the primary focus of discovery at Vanderbilt. Clinical research is conducted in Vanderbilt University Hospital, the Nashville Veterans Administration Hospital, Meharry General Hospital and in their associated outpatient clinics. These hospitals and clinics, all associated with the Vanderbilt system, each have full time Vanderbilt faculty and medical housestaff and provide clinical care and participate in research programs. The Vanderbilt Clinic is comprised of more than 95 adult outpatient specialty practices and received over 1.5 million ambulatory visits in 2012-13. The Vanderbilt Heart and Vascular Institute offers a comprehensive heart program offering diagnosis, medical treatment, minimally invasive therapies, surgical intervention and disease management, tailored to each individual's unique needs. All programs within the Vanderbilt Clinic have survival figures that surpass the national average.
Study
phs000906
University of Utah Pelvic Organ Prolapse Disorder Study
The overall purpose of this University of Utah Pelvic Organ Prolapse Disorder Study was to identify and localize predisposition genes contributing to pelvic organ prolapse (POP). POP cases recruited for this study were identified by one of three methods: high-risk POP pedigree cases, POP sister pairs, and surgically-treated POP cases reporting a family history of POP. High-risk POP pedigree cases were identified using the Utah Population Database (UPDB), a genealogy database of residents in Utah that has been linked to diagnostic ICD9 and CPT codes in medical records at the University of Utah and Intermountain Healthcare. We identified families with a significant excess number of POP cases compared to matched population rates and targeted these individuals for recruitment as well as any other POP cases in the family. POP sister pair cases were identified at the University of Utah Urogyncology clinic for women who had undergone POP surgery and also self-reported one or more sisters who were also surgically treated for POP. POP affection status of all sisters was confirmed either by physical examination or by chart review. Surgically treated-POP cases reporting a family history of POP were identified at the University of Utah Urogyncology clinic by self-report of a family history of POP. Efforts were made to recruit other affected family members and confirm affection status. To obtain DNA, subjects provided either a blood specimen or saliva. Medical records were reviewed by a urogynecologist and diagnostic information for pelvic organ prolapse and stress and overactive bladder were obtained. Collected DNA was genotyped and analyzed. To maintain confidentiality of the familial data, genetic data from only one subject per family has been submitted to dbGaP. Use of the University of Utah Pelvic Organ Prolapse Disorder Study data is limited to investigators studying pelvic floor disorders. These pelvic floor disorders include pelvic organ prolapse, urinary and anal incontinence, and other conditions related to weakening or injury to the muscles and connective tissue in the pelvis as a result of pelvic surgery, pregnancy, or vaginal delivery of a child. These data will be used only for research purposes related to pelvic floor disorders. They will not be used to determine the individual identity of any person or their relationship to another person or for research on non-disease traits.
Study
phs001439
Transcriptomic Analysis of HIV-Infected Cells
Defining the attributes of rare CD4 T cells that harbor latent HIV under therapy with anti-retroviral agents has been challenging due to difficulties in identifying and isolating these cells. These cells represent an important barrier to cure, and characterizing them would aid the design and development of novel HIV cure strategies. In this study, we addressed this challenge using a custom microfluidic technology named Focused Interrogation of cells by Nucleic acid Detection and Sequencing (FIND-seq) that isolates HIV-infected cell transcriptomes, based on HIV DNA detection. We applied this method to HIV DNA+ memory CD4 T cells isolated from blood from five persons with HIV who were receiving anti-retroviral agents. After curating samples based on data quality, we proceeded with bioinformatics analysis from three persons. This analysis identified six transcriptomic pathways that were inhibited in HIV DNA+ cells compared to HIV DNA- cells. These included death receptor signaling, necroptosis signaling, and anti-proliferative G-alpha12/13 signaling. Furthermore, we identified two gene clusters, including 60 and 85 genes, that were significantly associated with HIV DNA+ cells identified by co-expression network analysis. These genes included negative regulators of HIV transcription that were higher in HIV DNA+ cells, positive regulators of HIV transcription that were lower in HIV DNA+ cells, and other genes involved in the negative regulation of mRNA translation, additional RNA processing functions, and the regulation of cellular state and fate. We further examined these signatures in transcriptomic data from CD4 T cell subsets sorted by flow cytometry from nine persons with HIV. We found a partial similarity between the smaller cluster of genes and the signature of CCR6- peripheral TFH cells. Raw sequencing data from all eleven human participant samples included in the study are available. These include samples from two participants who were studied only by FIND-seq, three participants who were studied by both FIND-seq and flow cytometry and six participants who were studied only by flow cytometry.
Study
phs003095
Women's Health Initiative Clinical Trial and Observational Study - Imaging
Objectives: The clinical trial assessed the safety and efficacy of three interventions. Specifically, it evaluated (1) the major health benefits and risks of estrogen plus progestin and estrogen alone, (2) the effects of a low-fat eating pattern on risk of colorectal cancer, and (3) the efficacy of calcium with vitamin D supplementation for preventing hip and other fractures. The objective of the memory study was to determine whether estrogen plus progestin therapy protects global cognitive function, and evaluate the therapy's effect on the incidence of dementia and mild cognitive impairment.The observational study is examining the relationship between lifestyle, socioeconomic, health, and other risk factors with cardiovascular, breast cancer, colorectal cancer and osteoporotic fracture outcomes. Secondary objectives include providing more reliable estimates of the extent to which known risk factors predict disease, more precise estimates of new occurrences of disease, and to provide a future resource for the identification of new or novel risk factors especially factors found in blood. Background: The Women's Health Initiative (WHI) is a long-term national health study that has focused on strategies for preventing the major causes of death, disability, and frailty in postmenopausal women, specifically heart disease, cancer, and osteoporotic fractures. The WHI is primarily composed of an observational study (OS), as well a clinical trial (CT) with three components: Hormone Replacement Therapy (HT), Dietary Modification, (DM) and Calcium/Vitamin D supplementation (CaD).Prior to the WHI, observational studies suggested that postmenopausal hormone therapy was associated with a decreased risk of coronary heart disease (CHD). Potential cardioprotection was based on generally supportive data on lipid levels in intermediate outcome clinical trials, trials in nonhuman primates, and a large body of observational studies suggesting a 40% to 50% reduction in risk among users of either estrogen alone or, less frequently, combined estrogen and progestin. Observational studies primarily examining unopposed estrogen preparations have suggested a 30% to 50% reduction in coronary events, and an 8% to 30% increase in breast cancer with extended use. Other research findings indicated that hormone therapy was also associated with a decreased risk of osteoporosis and increased bone density. The WHI HT trials were designed to test the effects of postmenopausal hormone therapy on risk for coronary heart disease and assess overall risks and benefits in predominantly healthy women. The Women's Health Initiative Memory Program (WHIMS) consists of a suite of studies which include cohorts of women who participated in the WHI HT trials. Postmenopausal women have a greater risk than men of developing Alzheimer's disease, but studies of the effects of estrogen therapy on Alzheimer's disease have been inconsistent. Additionally, observational studies have suggested that postmenopausal hormone treatment may improve cognitive function, but data from randomized clinical trials have been sparse and inconclusive. International comparisons and migration studies have suggested that countries with 50% lower fat intake than the US population had approximately one third the risk of colorectal cancer. Additionally, fairly consistent evidence existed for an effect of dietary fat, vegetables and fruits, and grains on colorectal cancer risk from within-country observational studies, although the protective effect of lower fat intake was no longer clear after adjusting for energy intake. The WHI DM trial was the first randomized trial to directly address the health effects of a low-fat eating pattern in predominantly healthy postmenopausal women from diverse racial/ethnic, geographic, and socioeconomic backgrounds. Osteoporosis is a major cause of injury, loss of independence, and death, and contributes to hip fractures. Observational evidence and data from previous randomized clinical trials suggest that calcium and/or vitamin D supplements may slow bone loss and reduce the risk of falls in postmenopausal and elderly women. However, evidence from trials, observational studies, and meta-analyses of calcium and vitamin D supplementation with respect to hip and other fractures was limited at the time the WHI was initiated. In two prior randomized trials, calcium plus vitamin D supplements did not reduce the risk of nonvertebral fractures among older women. When the WHI CaD trial was designed, guidelines recommended daily intakes of 800 to 1200 mg of calcium with 400 IU of vitamin D for the prevention of osteoporosis, which was not met by many American women. Therefore, the WHI CaD trial was designed to test the primary hypothesis that postmenopausal women randomly assigned to calcium plus vitamin D supplementation would have a lower risk of hip fracture and, secondarily, of all fractures than women assigned to placebo. Subjects: Postmenopausal women ages 50 to 79 were eligible to participate. A woman was considered postmenopausal if she had experienced no vaginal bleeding for 6 months (12 months for women under 55 years of age), had had a hysterectomy, or had ever used postmenopausal hormones. Recruitment was carried out in 40 US clinical centers in 1993-1998. The clinical trial components had additional specific inclusion or exclusion criteria.A total of 68,132 women were randomized into at least one component of the clinical trial. 27,347 women were enrolled in the hormone therapy component with 16,608 in the estrogen plus progestin trial and 10,739 in the unopposed estrogen trial, 48,835 women were enrolled in the diet modification component, and 36,282 women were enrolled in the calcium/vitamin D component. 7,479 women 65 years of age and older at baseline and that participated in the HT trial component were enrolled in the ancillary memory study. Women who were either ineligible or unwilling to participate in the clinical trial component were enrolled in the observational study. For example, many potential participants to the clinical trial component of the study were already undertaking a low fat diet or were using hormone replacement therapy. The effect of the selection process was that women enrolled in the observational study tended to have healthier lifestyles compared to those enrolled in the clinical trial. In total, 93,676 subjects were enrolled in WHI OS, with over 16% being members of a racial/ethnic minority group. The first WHI Extension Study enrolled 115,407 consenting participants from all components of the original WHI study for an additional five years of follow-up, from 2005 to 2010. In 2010, 93,567 women consented to continued follow-up. Design: The clinical trial component of the WHI included three randomized comparisons: hormone therapy, dietary modification, and calcium/vitamin D supplementation. Women could have been randomized into one, two or all three trials.The hormone therapy trial enrolled women to one of two double-blinded trials: estrogen (0.625 mg of conjugated equine estrogens daily) plus progestin (2.5 mg of medroxyprogesterone acetate daily) or estrogen alone. Women with a prior hysterectomy were eligible for the trial of unopposed estrogen. Women with an intact uterus at screening were initially also eligible for unopposed estrogen, but were reassigned to the trial of combined postmenopausal hormones beginning in 1995. Both trials randomized participants 1:1 to either hormone therapy or placebo. A 3-month washout period was required before baseline evaluation of women using postmenopausal hormones at initial screening. Study participants were contacted by telephone 6 weeks after randomization to assess symptoms and reinforce adherence. Follow-up contacts by telephone or clinic visit occurred every 6 months, with clinic visits required annually. The estrogen plus progestin trial was halted in July 2002 after a mean 5.2 years of follow-up because health risks, including increased risk of breast cancer and cardiovascular disease, exceeded benefits. The estrogen alone trial was stopped early in March 2004, because an increased risk of stroke was found with no benefit for coronary heart disease. The primary outcome was coronary heart disease (CHD) (nonfatal myocardial infarction and CHD death), with invasive breast cancer as the primary adverse outcome. The dietary modification trial evaluated the effect of a low-fat, high fruit, vegetable, and grain diet on preventing cardiovascular disease and cancer. Participants were randomly assigned to an intervention or a comparison group in the ratio of 2:3 for cost-efficiency. The intervention was an intensive behavioral modification program, using 18 group sessions in the first year and quarterly sessions thereafter, led by specially trained and certified nutritionists. The program was designed to promote dietary change with the goals of reducing total fat to 20% of energy intake, increasing vegetables and fruits to at least 5 servings daily and grains to at least 6 servings daily. The intervention did not include total energy reduction or weight loss goals. Comparison group participants received a copy of the US Department of Health and Human Services' Dietary Guidelines for Americans and other health-related materials but were not asked to make dietary changes. Dietary intake was monitored using the WHI food frequency questionnaire at 1 year and in a rotating one-third subsample every year thereafter. Women completed a medical update questionnaire every 6 months, and medical records were sought for all women reporting colorectal cancer. The primary outcome was invasive colorectal cancer incidence. Participants in the calcium/vitamin D trial were randomized 1:1 to either supplements or placebo. Active tablets contained 500 mg of elemental calcium (as calcium carbonate) and 200 IU of vitamin D3, to be taken twice daily with meals. The presence and severity of symptoms, safety concerns, and outcomes were ascertained at annual clinic visits and telephone or clinic visits at intervening six-month intervals. Risk factors for fracture were assessed by questionnaire, interview, and clinical examination. The primary outcome was incidence of hip fracture. Participants in the observational study attended a baseline examination and were re-examined three years later. Participants completed annual updates of exposures and clinical outcomes by mail. Final data were collected by mail during the close-out period in April 2004 to March 2005. The major clinical outcomes of interest were coronary heart disease, stroke, breast cancer, colorectal cancer, endometrial cancer, ovarian cancer, osteoporotic fractures, diabetes, and total mortality. Most outcomes were initially ascertained by self-report on an annual questionnaire and documented by hospital and related records. Charts with potential cardiovascular, cancer, and fracture outcomes were sent to the local physician adjudicator for evaluation and classification. Staff at the Clinical Coordinating Center coded and adjudicated all cancers of major interest in the study using standardized SEER guidelines. In 2005, WHI participants were invited to join the Extension Study for an additional five years of follow-up in order to collect long-term outcomes. Participants completed annual data collection forms primarily by mail, similar to the OS follow-up. Women reporting study outcomes were contacted by WHI field center staff to obtain additional details and medical records, which were evaluated by physician adjudicators. In 2010, the woman remaining were invited to join the next Extension Study. In the second extension, women were divided into two groups, one of which would have outcomes documented with medical records (the Medical Records Cohort, MRC), and the other would just be followed by self-report (the Self-Report Cohort, SRC). The MRC consists of women who were in the hormone therapy trials, and all African-American and Hispanic women. In 2012-2013, a subset of the MRC was identified for a potential in-home visit to collect blood and several objective measures of physical functioning. Conclusions: Overall health risks exceeded benefits from use of combined estrogen plus progestin after an average 5.2 year follow-up among healthy postmenopausal US women (Rossouw et al., 2002, PMID:12117397). Among postmenopausal women aged 65 years or older, estrogen plus progestin did not improve cognitive function when compared with placebo (Rapp et al., 2003, PMID: 12771113), increased the risk for probable dementia, and did not prevent mild cognitive impairment (Shumaker, et al., 2003, PMID: 12771112). The use of conjugated equine estrogen increased the risk of stroke, decreased the risk of hip fracture, and did not affect CHD incidence in postmenopausal women with prior hysterectomy after an average of 6.8 years of follow-up (Anderson et al., 2004, PMID: 15082697). Over approximately 8 years of follow-up, a low-fat dietary pattern did not reduce the risk of colorectal cancer (Beresford, et al., PMID: 16467233). Calcium with vitamin D supplementation resulted in a small but significant improvement in hip bone density; however, no significant difference was observed in hip fractures (Jackson, et al., 2006, PMID: 16481635). A recent review summarizes the conclusions from the WHI clinical trials with a focus on clinical practice (Manson, et al., 2024, PMID: 38691368).Description of ECG Imaging Data: Electric cardiograms (ECGs) were given to all clinical trial participants at baseline and in years 3, 6, and 9 of the original WHI study.EKG data consist of 12 lead 10 seconds ECGS sampled at 500Hz via GE ECG machines and process via GE MUSE system. The ECG waveform were directly exported from GE MUSE using MUSE export function in XML format, which include EKG waveform data as well as other ECG characteristics. Waveform data is in base64 encoded format, when it is decoded, it is a binary data that can be used to draw waveform graph. Many programming languages and data tools have built in functions to decode base64 strings. All the other necessary information is included in the LeadData section, total byte size, total sample size etc. (usually 1 sample is 2 bytes). See example below: encoded-data (base64 encoded string) JwAoAC0AKAAiACIAJAAkACQAIwAiACIAHgAcABwAGwAZABgAGAAYABcAEwAQABAAEAAL^/AAsADAAM... decoded-binary-data (1 sample is 2 bytes) 270028002D002800220022002400240024002300220022001E001C001C001B00 1900180018001800170013001000100010000B000B000C000C000D000D000D00 0A000A000A0009000600040004000700070005000500020... These binary values are integers (Y axis data of the graph), hence it is a straightforward process to draw the waveform graph. Acquisition dates have been redacted from this ECG data to comply with WHI policy. All acquisition dates within files and in file names have been set to January 1, 1900 (19000101) to comply with this policy.
Study
phs003824
Genetic Testing to Understand and Address Renal Disease Disparities
People of African ancestry (Blacks) have increased risk of kidney failure due to numerous socioeconomic, environmental, and clinical factors. Two variants in the APOL1 gene (GeneID: 8542) are now thought to account for much of the racial disparity associated with hypertensive kidney failure in Blacks. One in 7 AAs carry this risk allele, but it is nearly absent in other populations; the variants protect against infection with African trypanosomiasis which is endemic to Africa. However, this knowledge has not been translated into clinical care to help improve patient outcomes and address disparities. It is unknown whether patient or clinician knowledge of genetic risk impacts patient care (i.e., renal surveillance, BP medication use/intensification) or outcomes such as BP control and CKD. GUARDD is a randomized trial to evaluate the effects and challenges of incorporating genetic risk information into primary care. Hypertensive, nondiabetic, adults with self-reported African ancestry, without kidney dysfunction, are recruited from diverse clinical settings and randomized to undergo APOL1 genetic testing at baseline (intervention) or at one year (waitlist control). Prior to enrolling their patients, providers are educated about genomics and APOL1. Guided by a genetic counselor, trained staff return APOL1 results to patients and provide low-literacy educational materials. Real-time clinical decision support tools alert clinicians of their patients' APOL1 results and associated risk status at the point of care. Our academic - community clinical partnership designed a study to generate information about the impact of genetic risk information on our primary outcome - patient care (renal surveillance by clinicians, of serum creatinine and/or urine albumin tests. with a sub-aim of reduction of systolic blood pressure). Secondary outcomes include impact on primary outcomes at 12 months, psycho-behavioral differences of patients between groups and over time, clinician knowledge, attitudes and beliefs at baseline and 12 months, and differences in outcomes between those tested and not tested. GUARDD will help establish the effective implementation of APOL1 risk informed management of hypertensive patients at high risk of CKD, and will provide a robust framework for future endeavors to implement genomic medicine in diverse clinical practices. It will also add to the important dialog about factors that contribute to and may help eliminate racial disparities in kidney disease.
Study
phs001620
OncoArray: Follow-up of Ovarian Cancer Genetic Association and Interaction Studies (FOCI)
The Follow-up of Ovarian Cancer Genetic Association and Interaction Studies (FOCI) was one of five projects funded in 2010 as part of the NCI's Genetic Associations and Mechanisms in Oncology (GAME-ON) initiative (http://epi.grants.cancer.gov/gameon/). FOCI represents a collective effort that builds upon the strengths and history of collaboration inherent in the Ovarian Cancer Association Consortium (OCAC), a multidisciplinary group comprised of epidemiologists, genetic epidemiologists, statistical geneticists, molecular and cell biologists and clinicians that was formed in 2005. The other four funded GAME-ON projects were: the ColoRectal TransdisciplinaryStudy (CORECT), Elucidating Loci Involved in Prostate Cancer Susceptibility (ELLIPSE), Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE), and Transdisciplinary Research in Cancer of the Lung (TRICL). As part of our aim to discover, expand, and replicate ovarian cancer susceptibility loci, the GAME-ON projects and other consortia formed the OncoArray network (http://epi.grants.cancer.gov/oncoarray/) to develop and genotype a new custom genotyping array in large numbers of cancer cases and controls (over 400,000 samples) across multiple cancer types. The FOCI data includes over 50,000 ovarian cancer cases and controls genotyped with the Oncoarray at the Center for Inherited Disease Research (CIDR). Genotype calling and quality control procedures were performed under a standardized protocol across the Oncoarray consortium, and over 490,000 SNPs passed QC and are included under this dbGaP submission.
Study
phs001882
Cardiovascular Health Study (CHS) - Imaging
The Cardiovascular Health Study (CHS) is a prospective study of risk factors for development and progression of coronary heart disease (CHD) and stroke in people aged 65 years and older. The 5,888 study participants were recruited from four U.S. communities and have undergone extensive clinic examinations for evaluation of markers of subclinical cardiovascular disease. The original cohort, enrolled in 1989-90, totaled 5,201 participants. A supplemental cohort of 687 predominantly African-American participants was enrolled in 1992-93. Clinic examinations were performed at study baseline and at annual visits through 1998-1999, and again in 2005-2006. Examination components included medical and personal history, medication inventory, ECG, blood pressure, anthropometry, assessment of physical and cognitive function, and depression screening. Other components done less frequently included phlebotomy, spirometry, echocardiography, carotid ultrasound, cerebral magnetic resonance imaging, measurement of ankle-brachial index and retinal exam. Participants were contacted by telephone annually between exams to collect information about hospitalizations and potential cardiovascular events. Since 1999, participants have been contacted every six months by phone, primarily to identify cardiovascular events and to assess physical and cognitive health. Standard protocols for the identification and adjudication of events were implemented during follow-up. The adjudicated events are myocardial infarction, angina, heart failure (HF), stroke, transient ischemic attack (TIA), claudication, and mortality.
Study
phs003639
eIMPACT Trial: Modernized Collaborative Care to Reduce the Excess CVD Risk of Older Depressed Patients
Cardiovascular disease (CVD) is the #1 killer of American men and women, and its economic burden is substantial and on the rise. Adults with depression are at elevated risk of CVD events and poor CVD prognosis. Unfortunately, past clinical trials of depression treatments have not observed the anticipated cardiovascular benefits. A novel explanation for these null results is that the depression interventions in these trials, which all involved patients with preexisting CVD, were delivered too late in the natural history of CVD. Accordingly, the objective of this single-center, parallel-group, assessor-blinded randomized controlled trial was to determine whether successful depression treatment before, versus after, the onset of clinical CVD reduces CVD risk in depression. Primary care patients with depression and elevated CVD risk from a safety net healthcare system (N = 216, Mage = 59 years, 78% female, 50% Black, 46% with income <$10,000/year) were randomized 1:1 to 12 months of the eIMPACT intervention (a modernized collaborative stepped care program involving internet cognitive-behavioral therapy [CBT], telephonic CBT, and/or select antidepressants) or usual primary care for depression (primary care providers supported by embedded behavioral health clinicians and psychiatrists). The primary outcome was endothelial dysfunction (brachial flow-mediated dilation) at 12 months, and the secondary outcomes were self-reported depressive symptoms (Hopkins Symptom Checklist-20), autonomic dysfunction (high-frequency heart rate variability), systemic inflammation (interleukin-6 and high-sensitivity C-reactive protein), and platelet activation (β-thromboglobulin and platelet factor 4) at 12 months. The central hypothesis was that the eIMPACT intervention would improve endothelial dysfunction in depressed adults by decreasing depressive symptoms, autonomic dysfunction, systemic inflammation, and platelet activation. Primary results are posted to ClinicalTrials.gov at NCT02458690. The de-identified limited dataset available on BDC consists of only the individual phenotypic data needed to confirm this trial’s primary results – i.e., randomization status, primary and secondary outcome variables, and participant characteristics utilized in the hypothesis-testing models (age, sex, race, education, income, and systolic blood pressure). We also provide the SAS code to replicate the results of the hypothesis-testing models reported in Table 4 and Supplemental Tables 1-3 in the main outcomes paper (#2 in Selected Publications). Of note, the user will need to specify the file path for the dataset to run the SAS code. Instructions for requesting individual-level data are available on BDC at https://biodatacatalyst.nhlbi.nih.gov/resources/data/. Apply for data access in dbGaP. Upon approval, users may begin accessing requested data in BDC. For questions about availability, you may contact the BDC team at https://biodatacatalyst.nhlbi.nih.gov/contact.
Study
phs003283
Characterization_of_individual_foci_of_multicentric_multifocal_breast_cancer_using_targeted_next_generation_sequencing
Multifocality or multicentricity in breast cancer may be defined as the presence of two or more tumor foci within a single quadrant of the breast or within different quadrants of the same breast, respectively. This original classification of the breast cancer as multicentric or multifocal was based on the assumption that cancers arising in the same quadrant were more likely to arise from the same ductal structures than those occurring in separate areas of the breast. The problem with these definitions is that the “quadrants” of the breast are arbitrary external designations, as no internal boundaries do exist. This project will therefore focus both on synchronous multifocal and multicentric tumors. The incidence of multifocal and multicentric breast cancers was reported to be between 13 and 75% depending on the definition used, the extent of the pathologic sampling of the breast and whether in situ disease is considered evidence of multicentricity (1). Although this incidence is variable, those figures show that it is a frequent phenomenon. Multiple (multifocal/multicentric) breast carcinomas, especially when occurring in the same breast, represent a real challenge for both pathologists and clinicians in terms of identifying the cellular origin and the best therapeutic management of the cancer. Multifocality or multicentricity has been associated with a number of more aggressive features including an increased rate of regional lymph node metastases and adverse patient outcome when compared with unifocal tumors (2-3), and a possible increased risk of local recurrence following breast conserving surgery (4). For the moment, the literature is divided on whether there is a corresponding impact on survival outcomes. Today, the current convention to stage and to treat multifocal and multicentric tumors is the classical tumor-node-metastasis (TNM) staging guidelines with which tumor size is assessed by the largest tumor focus without taking other foci of disease into consideration. If some papers, as the recent one from Lynch and colleagues, support the current staging convention (3), others, however, as Boyages et al. suggested that aggregate size and not the size of the largest lesion should be considered in order to refine the prognostic assessment of those tumors (5). On the top of that, the question whether multifocal/multicentric carcinomas are due to the spread of a single carcinoma throughout the breast or is due to multiple carcinomas arising simultaneously has been a matter of debate. Some studies suggested that multifocal breast cancer may result from either intramammary spread from a single primary tumor or multiple synchronous primary tumors; whereas others suggest that multiple breast carcinomas always arise from the same clone (6-8). Recently, Pietri and colleagues analyzed the biological characterization of a series of 113 multifocal/multicentric breast cancers (8) which were diagnosed over a 5-year period. The expression of estrogen (ER) and progesterone (PgR) receptors, Ki-67 proliferative index, expression of HER2 and tumor grading were prospectively determined in each tumor focus, and mismatches among foci were recorded. Mismatches in ER status were present in 5 (4.4%) cases and PgR in 18 (15.9%) cases. Mismatches in tumor grading were present in 21 cases (18.6%), proliferative index (Ki-67) in 17 (15%) cases and HER2 status in 11 (9.7%) cases. Interestingly, this heterogeneity among foci has led to 14 (12.4%) patients receiving different adjuvant treatments compared with what would have been indicated if we had only taken into account the biologic status of the primary tumor. This study therefore showed that differences in biological characteristics of multifocal/multicentric lesions play a crucial role in the adjuvant treatment decision making process. In this study, we will concentrate on a larger series of patients with multifocal invasive ductal breast cancer lesions. We aim at: 1. Evaluating the incidence of multifocality according to the different breast cancer molecular subtypes (ER-/HER2-, HER2+, ER+/HER2-). 2. Evaluating the incidence of multifocality in patients with hereditary breast cancer disease (presence of germline BRCA1 or BRCA2 mutations). Moreover, we would like to investigate if multifocal lesions with BRCA1 or BRCA2 mutations exhibit a characteristic combination of substitution mutation signatures and a distinctive profile of deletions as demonstrated recently by Nik-Zainal and colleagues (9). 3. Correlating multifocality with clinical information in order to define its influence on patients’ survival (DFS and OS). 4. Carrying high coverage targeted gene sequencing of driver cancer genes and genes whose mutation is of therapeutic importance in order to compare clinically-relevant genetic differences between several multifocal breast cancer lesions. 5. Evaluating the impact of the distance between the different lesions on the clinical outcome but also on the genetic differences. 6. Comparing gene expression patterns between several multifocal breast cancer lesions and correlate them with the results of the targeted genes screen. 7. Characterizing the genomic and transcriptomic status of cancer related genes in metastatic lesions (local recurrence, positive lymph node or distant metastatic sites) from the same multifocal invasive ductal breast cancer patients in order to evaluate the consequence of genomic and transcriptomic heterogeneity of multifocal lesions on metastatic lesions. Multiple (multifocal/multicentric) breast carcinomas, especially when occurring in the same breast, represent a real challenge for both pathologists and clinicians in terms of identifying the cellular origin and the best therapeutic choice. This project has the potential to identify genetic/transcriptomic differences existing between several lesions constituting multifocal breast cancers, which in the routine clinical practice are usually considered to be homogeneous among them. We foresee validating significant results in a larger series of patients and this, in turn, could have a remarkable impact on the treatment and clinical management of multifocal breast cancers. Indeed, we hope to provide some evidence whether or not each focus matters in multifocal and multicentric breast cancer to define the adequate therapeutic approach, especially in the context of targeted therapies. The work to be done at Sanger will be target gene screen pooling of 1400 samples.
Study
EGAS00001000407
National Human Genome Research Institute (NHGRI) GENEVA Genome-Wide Association Study of Venous Thrombosis (GWAS of VTE)
Overview: Our overall long-term goal is to determine risk factors for the complex (multifactorial) disease, venous thromboembolism (VTE), that will allow physicians to stratify individual patient risk and target VTE prophylaxis to those who would benefit most. In this genome-wide association case-control study (1300 cases and 1300 controls) we hope to identify susceptibility variants for VTE. Mutations within genes encoding for important components of the anticoagulant, procoagulant, fibrinolytic, and innate immunity pathways are risk factors for VTE. We hypothesize that other genes within these four pathways or within other pathways also are VTE disease-susceptibility genes. Therefore, we performed a genome wide association (GWA) screen and analysis using the Illumina 660W platform to identify SNPs within 1,300 clinic-based, non-cancer VTE cases primarily from Minnesota and the upper Midwest USA, and 1300 clinic-based, unrelated controls frequency-matched on patient age, gender, myocardial infarction/stroke status and state of residence. This is a subset of a slightly larger candidate gene study using 1500 case-control pairs to identify haplotype-tagging SNPs (ht-SNPs) in a large set of candidate genes (n~750) within the anticoagulant, procoagulant, fibrinolytic, and innate immunity pathways. Study Populations. Cases. VTE cases were consecutive Mayo Clinic outpatients with objectively-diagnosed deep vein thrombosis (DVT) and/or pulmonary embolism (PE) residing in the upper Midwest and referred by Mayo Clinic physician to the Mayo Clinic Special Coagulation Laboratory for clinical diagnostic testing to evaluate for an acquired or inherited thrombophilia, or to the Mayo Clinic Thrombophilia Center. Any person contacted to be a control but discovered to have had a VTE was evaluated for inclusion as a case. Cases were primarily residents from Minnesota, Wisconsin, Iowa, Michigan, Illinois, North or South Dakota, Nebraska, Kansas, Missouri and Indiana. A DVT or PE was categorized as objectively diagnosed when (a) confirmed by venography or pulmonary angiography, or pathology examination of thrombus removed at surgery, or (b) if at least one non-invasive test (compression duplex ultrasonography, lung scan, computed tomography scan, magnetic resonance imaging) was positive. A VTE was defined as: Proximal leg deep vein thrombosis (DVT), which includes the common iliac, internal iliac, external iliac, common femoral, superficial [now termed "femoral"] femoral, deep femoral [sometimes referred to as "profunda" femoral] and/or popliteal veins. (Note: greater and lesser saphenous veins, or other superficial or perforator veins, were not included as proximal or distal leg DVT). Distal leg DVT (or "isolated calf DVT"), which includes the anterior tibial, posterior tibial and/or peroneal veins. (Note: gastrocnemius, soleal and/or sural [e.g., "deep muscular veins" of the calf] vein thrombosis was not included as distal leg DVT). Arm DVT, which includes the axillary, subclavian and/or innominate (brachiocephalic) veins. (Note: jugular [internal or external], cephalic and brachial vein thrombosis was not included in "arm DVT"). Hepatic, portal, splenic, superior or inferior mesenteric, and/or renal vein thrombosis. (Note: ovarian, testicular, peri-prostatic and/or pelvic vein thrombosis was not included). Cerebral vein thrombosis (includes cerebral or dural sinus or vein, saggital sinus or vein, and/or transverse sinus or vein thrombosis). Inferior vena cava (IVC) thrombosis Superior vena cava (SVC) thrombosis Pulmonary embolism Patients with VTE related to active cancer, antiphospholipid syndrome, inflammatory bowel disease, vasculitis, a rheumatoid or other autoimmune disorder, a vascular anomaly (e.g., Klippel-Trénaunay syndrome, etc.), heparin-induced thrombocytopenia, or a mechanical cause for DVT (e.g., arm DVT or SVC thrombosis related to a central venous catheter or transvenous pacemaker, portal and/or splenic vein thrombosis related to liver cirrhosis, IVC thrombosis related to retroperitoneal fibrosis, etc.), with hemodialysis arteriovenous fistula thrombosis, or with prior liver or bone marrow transplantation were excluded. Controls. A Mayo Clinic outpatient control group was prospectively recruited for this study. Controls were frequency-matched on the age group (18-29, 30-39, 40-49, 50-59, 60-69, 70-79, and 80+ years), sex, myocardial infarction/stroke status, and state of residence distribution of the cases. We selected clinic-based controls using a controls' database of persons undergoing general medical examinations in the Mayo Clinic Departments of General Internal Medicine or Primary Care Internal Medicine. Additionally persons undergoing evaluation at the Mayo Clinic Sports Medicine Center, and the Department of Family Medicine were screened for inclusion as controls. This study is part of the Gene Environment Association Studies initiative (GENEVA, http://www.genevastudy.org) funded by the trans-NIH Genes, Environment, and Health Initiative (GEI). The overarching goal is to identify novel genetic factors that contribute to venous thrombosis through large-scale genome-wide association studies of 1,300 clinic-based, VTE cases and 1300 clinic-based, unrelated controls. Genotyping was performed at the Johns Hopkins University Center for Inherited Disease Research (CIDR). Data cleaning and harmonization were done at the GEI-funded GENEVA Coordinating Center at the University of Washington.
Study
phs000289
Detection of Colorectal Cancer Susceptibility Loci Using Genome-Wide Sequencing
The Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) is a collaborative effort comprised of a coordinating center and scientific researchers from well-characterized cohort and case-control studies. This international consortium aims to accelerate the discovery of common and rare genetic risk variants for colorectal cancer by conducting large-scale meta-analyses of existing and newly generated genome-wide association study (GWAS) data, whole genome sequencing, replicating and fine-mapping of genetic discoveries, and investigating how genetic risk variants are modified by environmental risk factors. To expand these efforts, we assembled case-control sets or nested case-control sets from 6 different North American or European studies. Summary descriptions and study participant inclusions/exclusion criteria for each of these studies are detailed below. Cancer Prevention Study II (CPS II): The CPS II Nutrition cohort is a prospective study of cancer incidence and mortality in the United States, established in 1992 and described in detail elsewhere (Calle et al., 2002 PMID:12015775; Campbell et al., 2014 PMID:25472679). At enrollment, participants completed a mailed self-administered questionnaire including information on demographic, medical, diet, and lifestyle factors. Follow-up questionnaires to update exposure information and to ascertain newly diagnosed cancers were sent biennially starting in 1997. Reported cancers were verified through medical records, state cancer registry linkage, or death certificates. The Emory University Institutional Review Board approves all aspects of the CPS II Nutrition Cohort. We restricted to samples that had blood DNA source. Controls were matched to cases in a case/control ratio of 2:1 on reference year and sex. Darmkrebs: Chancen der Verhütung durch Screening (DACHS): This German study was initiated as a large population-based case-control study in 2003 in the Rhine-Neckar-Odenwald region (southwest region of Germany) to assess the potential of endoscopic screening for reduction of colorectal cancer risk and to investigate etiologic determinants of disease, particularly lifestyle/environmental factors and genetic factors. Cases with a first diagnosis of invasive colorectal cancer (International Classification of Diseases 10 codes C18-C20) who were at least 30 years of age (no upper age limit), German speaking, a resident in the study region, and mentally and physically able to participate in a one-hour interview, were recruited by their treating physicians either in the hospital a few days after surgery, or by mail after discharge from the hospital. Cases were confirmed based on histologic reports and hospital discharge letters following diagnosis of colorectal cancer. All hospitals treating colorectal cancer patients in the study region participated. Based on estimates from population-based cancer registries, more than 50% of all potentially eligible patients with incident colorectal cancer in the study region were included. Community-based controls were randomly selected from population registries, employing frequency matching with respect to age (5-year groups), sex, and county of residence. Controls with a history of colorectal cancer were excluded. Controls were contacted by mail and follow-up calls. The participation rate was 51%. During an in-person interview, data were collected on demographics, medical history, family history of CRC, and various life-style factors, as were blood and mouthwash samples. Routine formalin-fixed, paraffin-embedded (FFPE) tumor samples from the patients enrolled were requested from the pathology institutes and used for tumor tissue analyses. This analysis includes participants with blood source DNA that were recruited up to 2010 in this ongoing study. Controls were matched to cases on reference age and sex in a case/control ratio of 2:1. Health Professionals Follow-up Study (HPFS): A parallel prospective study to the NHS (Nurses' Health Study). The HPFS cohort comprised 51,529 men aged 40-75 who, in 1986, responded to a mailed questionnaire (Rimm et al., 1990 PMID:2090285). Participants provided information on health related exposures, including current and past smoking history, age, weight, height, diet, physical activity, aspirin use, and family history of colorectal cancer. Colorectal cancer and other outcomes were reported by participants or next-of-kin and were followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical record review. Information was abstracted on histology and primary location. Incident cases were defined as those occurring after the subject provided the blood sample. Prevalent cases were defined as those occurring after enrollment in the study but before the subject provided the blood sample. Follow-up evaluation has been excellent, with 94% of the men responding to date. Colorectal cancer cases were ascertained through January 1, 2008. In 1993-1995, 18,825 men in the HPFS mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 13,956 men in the HPFS who had not provided a blood sample previously mailed in a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1986, but before the subject provided either a blood or buccal sample. Participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were excluded. Control participants were required to be free of invasive colorectal cancer and non-invasive (stage 0 in situ) colorectal cancer. For this study, only European ancestry participants with blood source DNA and incident colorectal cancer cases were eligible for selection. Since enrollment year and sex matched exactly, controls were randomly selected in a case/control ratio of 2:1. Nurses Health Study (NHS): The NHS cohort began in 1976 when 121,700 married female registered nurses age 30-55 years returned the initial questionnaire that ascertained a variety of important health-related exposures (Belanger et al., 1978 PMID:248266). Since 1976, follow-up questionnaires have been mailed every 2 years. Colorectal cancer and other outcomes were reported by participants or next-of-kin and followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical-record review. Information was abstracted on histology and primary location. The rate of follow-up evaluation has been high: as a proportion of the total possible follow-up time, follow-up evaluation has been more than 92%. Colorectal cancer cases were ascertained through June 1, 2008. In 1989-1990, 32,826 women in NHS I mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 29,684 women in NHS I who did not previously provide a blood sample mailed a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1976 but before the subject provided either a blood or buccal sample. Participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were excluded. For this study, only European ancestry participants with blood source DNA and incident colorectal cancer cases were eligible for selection. Since enrollment year and sex matched exactly, controls were randomly selected in a case/control ratio of 2:1. Prostate, Lung, Colorectal and Ovarian Cancer Screening Trail (PLCO): PLCO enrolled 154,934 participants (men and women, aged between 55 and 74 years) at ten centers into a large, randomized, two-arm trial to determine the effectiveness of screening to reduce cancer mortality. Sequential blood samples were collected from participants assigned to the screening arm. Participation was 93% at the baseline blood draw. White colorectal cancer cases with a family history of colorectal cancer (no history of ulcerative colitis, Crohn's Disease, diverticulitis, Gardner's syndrome, Familial Polyposis) and successful genotyping from previous Peters GWAS were selected for this project. Controls were matched to cases on reference age and sex in a case/control ratio of 2:1. Women's Health Initiative (WHI): WHI is a long-term national health study that has focused on strategies for preventing heart disease, breast and colorectal cancer, and osteoporotic fractures in postmenopausal women. The original WHI study included 161,808 postmenopausal women enrolled between 1993 and 1998. The Fred Hutchinson Cancer Research Center in Seattle, WA serves as the WHI Clinical Coordinating Center for data collection, management, and analysis of the WHI. The WHI has two major parts: a partial factorial randomized Clinical Trial (CT) and an Observational Study (OS); both were conducted at 40 Clinical Centers nationwide. The CT enrolled 68,132 postmenopausal women between the ages of 50-79 into trials testing three prevention strategies. If eligible, women could choose to enroll in one, two, or all three of the trial components. The components are: Hormone Therapy Trials (HT): This double-blind component examined the effects of combined hormones or estrogen alone on the prevention of coronary heart disease and osteoporotic fractures, and associated risk for breast cancer. Women participating in this component with an intact uterus were randomized to estrogen plus progestin (conjugated equine estrogens [CEE], 0.625 mg/d plus medroxyprogesterone acetate [MPA] 2.5 mg/d) or a matching placebo. Women with prior hysterectomy were randomized to CEE or placebo. Both trials were stopped early, in July 2002 and March 2004, respectively, based on adverse effects. All HT participants continued to be followed without intervention until close-out. Dietary Modification Trial (DM): The Dietary Modification component evaluated the effect of a low-fat and high fruit, vegetable and grain diet on the prevention of breast and colorectal cancers and coronary heart disease. Study participants were randomized to either their usual eating pattern or a low-fat dietary pattern. Calcium/Vitamin D Trial (CaD): This double-blind component began 1 to 2 years after a woman joined one or both of the other clinical trial components. It evaluated the effect of calcium and vitamin D supplementation on the prevention of osteoporotic fractures and colorectal cancer. Women in this component were randomized to calcium (1000 mg/d) and vitamin D (400 IU/d) supplements or a matching placebo. The Observational Study (OS) examines the relationship between lifestyle, environmental, medical and molecular risk factors and specific measures of health or disease outcomes. This component involves tracking the medical history and health habits of 93,676 women not participating in the CT. Recruitment for the observational study was completed in 1998 and participants were followed annually for 8 to 12 years. All centrally confirmed White cases of invasive colorectal cancer, or death from colorectal cancer were selected as potential cases from the March, 2011 database. Case priory lists are: 1) have positive family history of colorectal cancer; 2) randomly select cases until we get a total of n=800 cases. Control participants were required to be White, free of invasive colorectal cancer and non-invasive (stage 0 in situ) colorectal cancer. Centrally denied cases of colorectal cancer were not allowed into the control pool. Case and control participants were subject to the following exclusion criteria: (1) had prior history of colorectal cancer at baseline; (2) had no available DNA (DNA searching as Nov 15, 2012); (3) cannot be deposited to dbGaP; (4) lost to follow-up after enrollment; (5) selected for WHI study M26 Phase II. Controls were matched to cases in a case/control ratio of 2:1. In order to get 2 cases with 1 control, cases were grouped by enrollment year (a total of 5 groups). For each year group, around 50% cases were selected to match controls. In total, 401 cases were selected to match controls. Matching was done on enrollment year, which was matched exactly. For additional information, see dbGaP: phs000200 and ClinicalTrials: NCT00000611.
Study
phs001554
Metatranscriptomic Sequencing of Pulmonary Fluid in Immunocompromised Children
Despite improved diagnostics, pulmonary pathogens in immunocompromised children frequently evade detection, leading to significant mortality. In version 1.0 of this study, we performed RNA-based next generation sequencing (mNGS) on 41 lower respiratory samples collected from 34 children. We identified a rich cross-domain pulmonary microbiome containing bacteria, fungi, RNA viruses, and DNA viruses in each patient. Potentially pathogenic bacteria were ubiquitous among samples but could be distinguished as possible causes of disease by parsing for outlier organisms. Potential pathogens were detected in half of samples previously negative by clinical diagnostics. In version 2.0, we included an addition 278 samples from 229 pediatric stem cell transplant patients with pulmonary disease.
Study
phs001684
Rare Mendelian Disease in Old Order Amish and Mennonite Patients
The Clinic for Special Children (CSC) has integrated biochemical and molecular methods into a rural pediatric practice serving Old Order Amish and Mennonite (Plain) children. Among the Plain people, we have used single nucleotide polymorphism (SNP) microarrays to genetically map recessive disorders to large autozygous haplotype blocks (mean = 4.4 Mb) that contain many genes (mean = 79). For some, uninformative mapping or large gene lists preclude disease-gene identification by Sanger sequencing. Seven such conditions were selected for exome sequencing at the Broad Institute; all had been previously mapped at the CSC using low density SNP microarrays coupled with autozygosity and linkage analyses. Using between 1 and 5 patient samples per disorder, we identified sequence variants in the known disease-causing genes SLC6A3 and FLVCR1, and present evidence to strongly support the pathogenicity of variants identified in TUBGCP6, BRAT1, SNIP1, CRADD, and HARS. Our results reveal the power of coupling new genotyping technologies to population-specific genetic knowledge and robust clinical data.
Study
phs000623
Targeted MitoExome Sequencing of Mitochondrial OXPHOS Diseases (Massachusetts General Hospital)
Human disorders of mitochondrial oxidative phosphorylation (OXPHOS) represent a devastating collection of inherited diseases. These disorders impact at least 1:5000 live births, and are characterized by multi-organ system involvement. They are characterized by remarkable locus heterogeneity, with mutations in the mtDNA as well as in over 77 nuclear genes identified to date. It is estimated that additional genes may be mutated in these disorders. To discover the genetic causes of mitochondrial OXPHOS diseases, we performed targeted, deep sequencing of the entire mitochondrial genome (mtDNA) and the coding exons of over 1000 nuclear genes encoding the mitochondrial proteome. We applied this 'MitoExome' sequencing to 124 unrelated patients with a wide range of OXPHOS disease presentations from the Massachusetts General Hospital Mitochondrial Disorders Clinic. The 2.3Mb targeted region was captured by hybrid selection and Illumina sequenced with paired 76bp reads. The total set of 1605 targeted nuclear genes included 1013 genes with strong evidence of mitochondrial localization from the MitoCarta database, 377 genes with weaker evidence of mitochondrial localization from the MitoP2 database and other sources, and 215 genes known to cause other inborn errors of metabolism. Approximately 88% of targeted bases were well-covered (>20X), with mean 200X coverage per targeted base.
Study
phs000339
Characterization of individual foci of multicentric/multifocal breast cancer using targeted next generation sequencing
Multifocality or multicentricity in breast cancer may be defined as the presence of two or more tumor foci within a single quadrant of the breast or within different quadrants of the same breast, respectively. This original classification of the breast cancer as multicentric or multifocal was based on the assumption that cancers arising in the same quadrant were more likely to arise from the same ductal structures than those occurring in separate areas of the breast. The problem with these definitions is that the ?quadrants? of the breast are arbitrary external designations, as no internal boundaries do exist. This project will therefore focus both on synchronous multifocal and multicentric tumors. The incidence of multifocal and multicentric breast cancers was reported to be between 13 and 75% depending on the definition used, the extent of the pathologic sampling of the breast and whether in situ disease is considered evidence of multicentricity (1). Although this incidence is variable, those figures show that it is a frequent phenomenon. Multiple (multifocal/multicentric) breast carcinomas, especially when occurring in the same breast, represent a real challenge for both pathologists and clinicians in terms of identifying the cellular origin and the best therapeutic management of the cancer. Multifocality or multicentricity has been associated with a number of more aggressive features including an increased rate of regional lymph node metastases and adverse patient outcome when compared with unifocal tumors (2-3), and a possible increased risk of local recurrence following breast conserving surgery (4). For the moment, the literature is divided on whether there is a corresponding impact on survival outcomes. Today, the current convention to stage and to treat multifocal and multicentric tumors is the classical tumor-node-metastasis (TNM) staging guidelines with which tumor size is assessed by the largest tumor focus without taking other foci of disease into consideration. If some papers, as the recent one from Lynch and colleagues, support the current staging convention (3), others, however, as Boyages et al. suggested that aggregate size and not the size of the largest lesion should be considered in order to refine the prognostic assessment of those tumors (5). On the top of that, the question whether multifocal/multicentric carcinomas are due to the spread of a single carcinoma throughout the breast or is due to multiple carcinomas arising simultaneously has been a matter of debate. Some studies suggested that multifocal breast cancer may result from either intramammary spread from a single primary tumor or multiple synchronous primary tumors; whereas others suggest that multiple breast carcinomas always arise from the same clone (6-8). Recently, Pietri and colleagues analyzed the biological characterization of a series of 113 multifocal/multicentric breast cancers (8) which were diagnosed over a 5-year period. The expression of estrogen (ER) and progesterone (PgR) receptors, Ki-67 proliferative index, expression of HER2 and tumor grading were prospectively determined in each tumor focus, and mismatches among foci were recorded. Mismatches in ER status were present in 5 (4.4%) cases and PgR in 18 (15.9%) cases. Mismatches in tumor grading were present in 21 cases (18.6%), proliferative index (Ki-67) in 17 (15%) cases and HER2 status in 11 (9.7%) cases. Interestingly, this heterogeneity among foci has led to 14 (12.4%) patients receiving different adjuvant treatments compared with what would have been indicated if we had only taken into account the biologic status of the primary tumor. This study therefore showed that differences in biological characteristics of multifocal/multicentric lesions play a crucial role in the adjuvant treatment decision making process. In this study, we will concentrate on a larger series of patients with multifocal invasive ductal breast cancer lesions. We aim at: 1. Evaluating the incidence of multifocality according to the different breast cancer molecular subtypes (ER-/HER2-, HER2+, ER+/HER2-). 2. Evaluating the incidence of multifocality in patients with hereditary breast cancer disease (presence of germline BRCA1 or BRCA2 mutations). Moreover, we would like to investigate if multifocal lesions with BRCA1 or BRCA2 mutations exhibit a characteristic combination of substitution mutation signatures and a distinctive profile of deletions as demonstrated recently by Nik-Zainal and colleagues (9). 3. Correlating multifocality with clinical information in order to define its influence on patients? survival (DFS and OS). 4. Carrying high coverage targeted gene sequencing of driver cancer genes and genes whose mutation is of therapeutic importance in order to compare clinically-relevant genetic differences between several multifocal breast cancer lesions. 5. Evaluating the impact of the distance between the different lesions on the clinical outcome but also on the genetic differences. 6. Comparing gene expression patterns between several multifocal breast cancer lesions and correlate them with the results of the targeted genes screen. 7. Characterizing the genomic and transcriptomic status of cancer related genes in metastatic lesions (local recurrence, positive lymph node or distant metastatic sites) from the same multifocal invasive ductal breast cancer patients in order to evaluate the consequence of genomic and transcriptomic heterogeneity of multifocal lesions on metastatic lesions. Multiple (multifocal/multicentric) breast carcinomas, especially when occurring in the same breast, represent a real challenge for both pathologists and clinicians in terms of identifying the cellular origin and the best therapeutic choice. This project has the potential to identify genetic/transcriptomic differences existing between several lesions constituting multifocal breast cancers, which in the routine clinical practice are usually considered to be homogeneous among them. We foresee validating significant results in a larger series of patients and this, in turn, could have a remarkable impact on the treatment and clinical management of multifocal breast cancers. Indeed, we hope to provide some evidence whether or not each focus matters in multifocal and multicentric breast cancer to define the adequate therapeutic approach, especially in the context of targeted therapies. The work to be done at Sanger will be target gene screen pooling of 1400 samples.
Dataset
EGAD00001000624
NHLBI TOPMed: Lung Tissue Research Consortium (LTRC)
Chronic obstructive pulmonary disease (COPD), a disease state characterized by airflow limitation that is not fully reversible, is the third leading cause of death in the U.S. COPD is a heterogeneous syndrome, with affected individuals demonstrating marked differences in lung structure (emphysema vs. airway disease); physiology (airflow obstruction); and other clinical features (e.g., exacerbations, co-morbid illnesses). Multiple genomic regions influencing COPD susceptibility have been identified by genome-wide association studies (GWAS), and rare coding variants can also influence risk for COPD. However, only a small percentage of the estimated heritability for COPD risk can be explained by known genetic loci. Like most complex diseases, COPD is influenced by multiple genetic determinants (each with modest individual effects). Emerging evidence supports the paradigm that complex disease genetic determinants are part of a network of interacting genes and proteins; perturbations of this network can increase disease risk. To identify this network, multiple Omics data will need to be analyzed with methods to account for nonlinear relationships and interactions between key genes and proteins. Our overall hypothesis is that integrated network analysis of genetic, transcriptomic, proteomic, and epigenetic data from biospecimens ranging from lung tissue to nasal epithelial cells to blood in highly phenotyped subjects will provide insights into COPD pathogenesis and heterogeneity. We will leverage the well-phenotyped, NHLBI-funded Lung Tissue Research Consortium (LTRC) to address these questions. We will perform multi-omics analysis in 1548 lung tissue and blood samples from the LTRC. With these multi-omics data, we will utilize a systems biology approach to understand relationships between multiple genetic determinants and multiple types of Omics data. We will begin by performing single Omics analyses in COPD vs. control lung, nasal, and blood samples. Next, we will integrate single Omics data with genetic variants identified by WGS to assist in fine mapping genetic determinants of COPD. We will then perform integrated network analysis of COPD with genetic and multiple Omics data using correlation-based, gene regulatory, and Bayesian networks. Subjects were recruited from Mayo Clinic, Universities of Colorado, Michigan, and Pittsburgh, and Temple University.
Study
phs001662
FAM50A_Disruption_in_TOV21G_Cells___RNAseq
In this project we aimed to assess the transcriptional consequences of disrupting FAM50A in TOV21G cells in vivo. To do this we generated a dox inducible FAM50A gRNA allele such that we could conditionally inactivate FAM50A by feeding mice a diet containing doxycycline. Tumour masses were harvested from mice when they reached the ethical limit and RNA was extracted for RNA-Seq analysis.
Study
EGAS00001004156
North American Mitochondrial Disease Consortium Patient Registry and Biorepository
Mitochondrial diseases are caused by dysfunction of the mitochondria, which are specialized compartments that are present in every cell of the body except red blood cells. Mitochondria generate more than 90% of the energy that the body needs to sustain life and support growth. When they fail, less and less energy is generated within the cell. This injures the cell and can cause its death. If this process is repeated throughout the body, whole organ systems begin to fail, and the life of the person in whom this is happening is severely compromised. Mitochondrial diseases primarily affect children, but adult onset is becoming more and more common. Mitochondrial diseases are probably the most diverse human disorders at every level: clinical, biochemical, and genetic. Some affect only the nervous system but most affect many body systems, including the brain, heart, liver, skeletal muscles, kidney, and the endocrine and respiratory systems. Although mitochondrial disorders vary in severity, they are usually progressive, and often crippling. They can cause paralysis, seizures, mental retardation, dementia, hearing loss, blindness, weakness and premature death. Because of the range of symptoms and the frequent involvement of multiple body systems, mitochondrial diseases can be a great challenge to diagnose. Even when accurately diagnosed, they pose an even more formidable challenge to treat, as there are very few therapies and most are only partially effective. About this Study The first objective of this study is to establish a clinical registry of patients with suspected or confirmed mitochondrial diseases. We are collecting medical and family history, diagnostic test results, and prospective medical information for these patients and, using agreed procedures developed by the leading research clinicians in the field, providing, for the first time, standardized diagnoses of these complex disorders for the patients. The clinical information we collect from the participants will be used to learn about the spectrum of mitochondrial disorders and their prevalence. We will also develop studies which allow us to better understand how these diseases progress, which we do not understand well enough. When we begin clinical trials for mitochondrial diseases, patients enrolled in the registry who are identified as potentially eligible will be offered enrollment. Patients will only be included in studies if they give their consent in advance. The second objective of this study is to establish a biorepository for specimens and DNA from patients with mitochondrial diseases, in order to make materials easily available to consortium researchers.
Study
phs001538
Palindromic Amplification of The ERBB2 Oncogene in HER2-Positive Breast Cancer
DNA Inverted Repeats as an At-risk Motif for Palindromic Gene Amplificatio defines oncogene amplification that is configured as a series of inverted duplications (palindromic gene amplification). There are several, recurrently amplified oncogenes throughout the human genome. However, it remains unclear whether this recurrent amplification is solely a manifestation of increased fitness resulting from random amplification mechanisms, or if genomic locus-specific amplification mechanism plays a role. In this study, we show that the ERBB2 oncogene at 17q12 is susceptible to palindromic gene amplification in HER2-positive breast tumors. We investigated eight tumors in this study, of which five tumors were HER2-positive, and three tumors were HER2-negative. HER2-status was determined by clinical FISH tests. We applied three genomic approaches to investigate the amplification mechanism: (1) copy number analysis by array-CGH on the Affymetrix SNP6.0 platform (8 files), (2) sequencing of DNA libraries enriched with tumor-derived palindromic DNA (Genome-wide Analysis of Palindrome Formation, GAPF-seq) (8 files) and (3) unbiased whole genome sequencing (WGS) (1 file). These molecular data is made available in the dbGaP. Genomic studies using tumor DNA was approved under the Internal Institutional Review Board at the Cleveland Clinic (IRB07-136: EXEMPT: Chromosome Breakage and DNA Palindrome Formation). Specimens were obtained and methods were carried out under the auspices of IRB 7881 (Evaluation of Genetic and Molecular Markers in Patients with Breast Cancer). All patients consented to allow their cancer specimens to be used by researchers in an anonymized fashion. The consent form indicates that publication will take place without identifiers to protect the identity of any specific individual. We observed significant and enrichment of palindromic DNA within amplified ERBB2 genomic segments in four out of five HER2-positive tumors. None of three HER2-negative tumors showed such enrichment. Palindromic DNA was particularly enriched at amplification peaks and boundaries between amplified and normal copy-number regions. Thus, palindromic gene amplification shaped the amplified ERBB2 locus. The moderate enrichment of palindromic DNA throughout the amplified segments leads us to propose that the ERBB2 locus is amplified through a mechanism that repeatedly generates palindromic DNA, such as Breakage-Fusion-Bridge cycles. Our results reveal a potential interaction between local genomic environments and gene amplification mechanisms. This study is published under the title "Palindromic amplification of the ERBB2 oncogene in primary HER2-positive breast tumors" (PMID:28211519).
Study
phs001261
Genome-wide association study in multiple human prion diseases suggests genetic risk factors additional to PRNP
We conducted GWAS of sporadic CJD, variant CJD, iatrogenic CJD, inherited prion disease, kuru and resistance to kuru despite attendance at mortuary feasts. After quality control we analysed 2000 samples and 6015 control individuals (provided by the Wellcome Trust Case Control Consortium and KORA-gen), for 491032-511862 SNPs in the European study. Association studies were done in each geographical and aetiological group followed by several combined analyses.
Study
EGAS00000000097
Pharmacogenetics of Efavirenz Discontinuation for Reported Central Nervous System Symptoms Appears to Differ by Race
This is an evaluation of genetic associations with efavirenz (EFV) discontinuation for central nervous system (CNS) symptoms within 12 months of treatment. Patients were treated at an HIV primary care clinic in Nashville TN from 1998 to 2012. Previously known SNPs in CYP2B6 and CYP2A6 were used to define metabolizer genotypes (extensive, intermediate, slow metabolizer). Over 500,000 SNPs from genome-wide genotyping were used to define MDS (Multidimensional Scaling) coordinates to account for population stratification. Patients were defined as cases if they discontinued EFV for CNS symptoms within 12 months, otherwise they were defined as controls if they did not stop treatment. Among 563 evaluable participants, the hazard ratio for EFV discontinuation for CNS symptoms was 4.9 (95% C.I. 1.9 TO 12.4, p=0.001) in slow metabolizers compared to extensive metabolizers. This association was very significant in Whites 6.5 (95% CI: 2.3 to 18.8; p = 0.001), but not in Blacks 2.6 (95% C.I. 0.5 to 14.1; p = 0.27). The reason for this difference by race is not clear and warrants further investigation.
Study
phs001253
Comparison of capture-based method for transcriptome profiling of formalin-fixed paraffin embedded tumor samples
Background: The need for fresh frozen (FF) tissue limits implementing RNA sequencing (RNA-seq) in the clinic. The majority of clinical samples are stored as formalin-fixed, paraffin-embedded (FFPE) tissues. Exome capture platforms have been developed for RNA-seq from FFPE samples. However, these methods have not been systematically compared.
Methods: Transcriptomic analysis of 32 FFPE tumor samples from 11 patients was performed using three exome capture-based methods: Agilent SureSelect V6, TWIST NGS Exome, and IDT XGen Exome Research Panel. We compared these methods to TruSeq RNA-seq of fresh frozen (FF-TruSeq) tumor samples from the same patients. We assessed the recovery of clinically relevant biological features.
Results: The Spearman's correlation coefficients between the global expression profiles of the three capture-based methods from FFPE and matched FF-TruSeq were high (rho = 0.72-0.9, p < 0.05). A significant correlation between the expression of key immune genes between individual capture-based methods and FF-TruSeq (rho = 0.76-0.88, p < 0.05) was observed. All exome capture-based methods reliably detected outlier expression of actionable genes, including ERBB2, MET, NTRK1, and PPARG. In urothelial cancer samples, the Agilent assay was associated with the highest molecular subtype concordance with FF-TruSeq (Cohen's k = 0.7, p < 0.01). Both Agilent and IDT detected all the clinically relevant fusions that were initially identified in FF-TruSeq.
Conclusion: All exome capture-based methods had comparable performance and concordance with FF-TruSeq. By enabling the interrogation of FFPE tumor samples, our findings will enable the implementation of RNA-seq in the clinic to guide precision oncology approaches.
Study
EGAS00001005255
Roma Sequencing Study
Present day Roma genome were shaped by the extensive inbreeding and admixture during their Diaspora. Here, we shed light on the Roma demographic history by analyzing the whole genome sequence of 46 Roma individuals pertaining to four migrant groups in six European countries. The strong reduction in effective population size (~44%), that occurred around 2kya, was not masked by the subsequent high admixture in Middle Eastern and European countries.
Study
EGAS00001004287
RNA sequencing of CAR-T cells with CD38-CD73-Tim-3-HLA-DR+ phenotype and others in infusion products of tisagenlecleucel for B-cell precursor acute lymphoblastic leukemia
We enrolled 19 patients with BCP-ALL who took tisagenlecleucel, and infusion products and patient samples were obtained before and after CAR-T infusion. Single-cell analysis by CyTOF revealed that central memory CAR+ T cells increased in long-term responders, whereas PD-1highCD38highCXCR3+ effector CAR+ T cells were enriched in relapsed patients after CAR-T infusion. Furthermore, in long-term responders, CAR+ T cells obtained from infusion products were enriched with CD38-CD73-Tim-3-HLA-DR+ phenotype, and characterized by decreased ability to produce adenosine, and they showed more memory-like transcriptomic characteristics. We elucidated the characteristics of tisagenlecleucel related to a durable response, which could contribute to improving the clinical outcome.
Study
JGAS000760
Transcriptome analysis of iPSC-derived hepatocytes from Wilson's Disease patients and healthy controls
To identify the differences in global gene expression patterns in WD-specific hepatocytes, we performed RNA-seq analysis. Six WD-specific hepatocytes and 6 healthy-donor (HD) hepatocytes were compared in total. The data suggested that our RNA-seq data were partially validated in the context of ATP7B deficiency and that these overlapped genes might be regulated by ATP7B deficiency among mammalian species in common.
Study
JGAS000382
SHANK2 mutations associated with Autism Spectrum Disorder cause hyperconnectivity of human neurons
Heterozygous loss-of-function mutations in the synaptic scaffolding gene SHANK2 are strongly associated with autism spectrum disorder (ASD). To investigate their effect on synaptic connectivity, we generated cortical neurons from induced pluripotent stem cells (iPSC) derived from neurotypic and ASD-affected donors. We developed Sparse coculture for Connectivity (SparCon) assays where SHANK2 and control neurons were differentially labeled and sparsely seeded together on a lawn of unlabeled control neurons. We observed striking increases in dendrite length, dendrite complexity, total synapse number, and frequency of spontaneous excitatory postsynaptic currents. These findings were phenocopied in gene-edited homozygous SHANK2 knockout cells and rescued by gene correction of an ASD SHANK2 muation, supporting a role for SHANK2 as a regulator of connectivity in developing human neurons. Dendrite length increases were exacerbated by IGF1, TG003, or BDNF, and suppressed by DHPG treatment. The transcriptome in these isogenic SHANK2 neurons was deeply perturbed in synapse, plasticity, and neuronal morphogenesis gene sets and ASD gene modules, and activity-dependent dendrite extension was impaired. Our unexpected findings provide evidence for hyperconnectivity and profoundly altered transcriptome in SHANK2 neurons derived from ASD subjects.
Study
EGAS00001003436
Celiac disease meta-analysis
Celiac disease (CeD) is a common immune-mediated disease of the small intestine that is triggered by exposure to dietary gluten. While the HLA locus plays a major role in disease susceptibility, 39 non-HLA loci were also identified in a study of 24,269 individuals. We now build on this earlier study by adding 4,125 additional Caucasian samples including an Argentinian cohort genotyped on the Immunochip. In doing so, we not only confirm the previous associations, we also identify two novel independent genome-wide significant associations.
Study
EGAS00001003805
L1-Seq and Genome-Wide SNP Genotyping in a Multiethnic Asian Population
Insertions of the human-specific subfamily of LINE-1 (L1) retrotransposon are highly polymorphic across individuals and can critically influence the human transcriptome. We hypothesized that L1 insertions could represent genetic variants determining important human phenotypic traits, and performed an integrated analysis of L1 elements and single nucleotide polymorphisms (SNPs) in several human populations. We found that a large fraction of L1s were in high linkage disequilibrium (LD) with their surrounding genomic regions and that they were well-tagged by SNPs. However, L1 variants were only partially captured by SNPs on standard SNP arrays, so that their potential phenotypic impact would be frequently missed by SNP array-based genome-wide association studies. We next identified potential phenotypic effects of L1s by looking for signatures of natural selection linked to L1 insertions; significant extended haplotype homozygosity (EHH) was detected around several L1 insertions. This suggests that some of these L1 insertions may have been the target of recent positive selection.
Study
phs000732
Comprehensive Deep Sequencing Atlas in HCC tumors
Establishment of a comprehensive deep sequencing atlas focused on Hepatocellular Carcinoma (HCC) tumors, employing Whole Genome Sequencing (WGS), RNA Sequencing (RNAseq), and Whole Exome Sequencing (WES). The atlas aims to unravel the intricate genetic landscape of HCC, providing a detailed characterization of genomic alterations, transcriptomic profiles, and key mutations associated with this liver cancer. The integration of WGS, RNAseq, and WES data offers a holistic perspective, facilitating a deeper understanding of the molecular mechanisms driving HCC pathogenesis. The resulting atlas serves as a valuable resource for researchers, clinicians, and the broader scientific community, contributing to advancements in HCC diagnostics, prognostics, and therapeutic interventions.
Study
EGAS00001007694
RNA-seq data for study 'Smoking-dependent expression alterations in nasal epithelium reveal immune impairment linked to germline variation and lung cancer risk.'
RNA-seq data from nasal and bronchial tissues in 649 subjects, many with lung cancer.
Lung cancer is the leading cause of cancer-related death in the world. In contrast to many other cancers, a direct connection to lifestyle risk in the form of cigarette smoke has long been established. More than 50% of all smoking-related lung cancers occur in former smokers, often many years after smoking cessation. Despite extensive research, the molecular processes for persistent lung cancer risk are unclear. CT screening of current and former smokers has been shown to reduce lung cancer mortality by up to 26%. To examine whether clinical risk stratification can be improved upon by the addition of genetic data, and to explore the mechanisms of the persisting risk in former smokers, we have analyzed transcriptomic data from accessible airway tissues of 487 subjects. We developed a model to assess smoking associated gene expression changes and their reversibility after smoking is stopped, in both healthy subjects and clinic patients. We find persistent smoking associated immune alterations to be a hallmark of the clinic patients. Integrating previous GWAS data using a transcriptional network approach, we demonstrate that the same immune and interferon related pathways are strongly enriched for genes linked to known genetic risk factors, demonstrating a causal relationship between immune alteration and lung cancer risk. Finally, we used accessible airway transcriptomic data to derive a non-invasive lung cancer risk classifier. Our results provide initial evidence for germline-mediated personalised smoke injury response and risk in the general population, with potential implications for managing long-term lung cancer incidence and mortality.
Dataset
EGAD50000000333
Natural Genetic Variation in the Human Genome
In this study, we used existing whole genome sequencing data from the TOPMed Amish and Jackson Heart Study, as well as the GTEx study, to identify mobile element insertions in the genomes of these individuals.  A total of 1112 TOPMed Amish, 3331 TOPMed Jackson Heart Study, and 698 GTEx individuals were examined.  We used the Mobile Element Locator Tool (MELT) developed previously by our lab, along with a Cloud-based version of MELT (CloudMELT), to discover non-reference mobile element insertions (MEIs) in these genomes.  We discovered Alu, L1, SVA, and HERV-K MEIs and produced genotypes in individual genomes.  Build hg19 bams were used for the TOPMed samples, and build GRCh37 bams were used for the GTEx samples in this study.
Study
phs002463
MicroRNAs, Hypertension and End Organ Damage in Humans
Study 1) Profiling of microRNAs in plasma of patients with hypertension complicated or not by metabolic syndrome or chronic kidney disease
Hypertension (HTN) or high blood pressure is associated with subclinical target organ damage such as cardiac, vascular and kidney injury, which may lead to complications and death. In this study, we investigated circulating microRNAs as biomarkers of target organ damage in HTN patients. We profiled circulating microRNAs by RNA sequencing in platelet-poor plasma of normotensive subjects and patients with HTN complicated or not by metabolic syndrome (MetS) or chronic kidney disease (CKD) (n=15 each group). Differentially expressed microRNAs were identified with a threshold of false discovery rate <0.1. Differentially expressed microRNAs were identified uniquely to associate with HTN (8), MetS (1) or CKD (13), and 8 were similarly differentially expressed in different groups. This study identified the association of differentially expressed circulating microRNAs with target organ damage in HTN patients, which could have some pathophysiological and therapeutic implications.
Study 2) Profiling of microRNAs in gluteal subcutaneous small arteries of patients with hypertension complicated or not by chronic kidney disease
Hypertension (HTN) is associated with vascular damage characterized by endothelial dysfunction and vascular remodeling and stiffening, which contributes to kidney injury leading to chronic kidney disease (CKD). MicroRNAs are short non-coding RNAs which repress/degrade target mRNAs. The microRNA role in vascular injury in HTN remains unclear. In this study, we aimed to identify differentially expressed microRNAs in small arteries of patients with HTN associated or not with CKD, in order to shed light on the pathophysiological molecular mechanisms. Normotensive subjects and HTN patients associated or not with CKD grades 3-4 were studied (n=15-16). Small arteries were isolated from subcutaneous gluteal biopsies, RNAs were extracted, and small and total RNA sequencing was performed by Illumina HiSeq-2500. Differentially expressed genes were identified with a P<0.05 and fold change (FC) >1.3. Differentially expressed microRNAs and mRNAs were identified uniquely to associate with HTN (microRNAs: 10, mRNAs: 68), CKD (microRNAs: 68, mRNAs: 395), and both groups (microRNAs: 2, mRNAs: 32). This study identified differentially expressed microRNAs and mRNAs in small arteries with target organ damage in HTN, which could have some pathophysiological and therapeutic implications.
Study
phs002389
RNA sequencing of genetically modified human iPSCs modeling patients with autism spectrum disorders (ASD)
We generated two genetically modified iPSC lines. In each line, a single promoter variant identified in a patient with autism spectrum disorder (ASD) was introduced by using the CRISPR/Cas9 system. These mutant lines were analyzed by RNA-seq together with control lines to assess the impact of the introduced variant.
Study
JGAS000651
Study of complex rearrangements and mutational signatures in neuroblastoma heterogeneous risk groups.
Neuroblastoma is a pediatric solid tumor characterized by strong clinical heterogeneity. Although certain complex genomic alterations, such as extrachromosomal DNA amplifications (ecDNA), are associated with adverse outcome and have been recurrently detected in neuroblastomas, the mutational processes involved in their generation remain largely unclear. By examining the topography of complex rearrangements along with mutational signatures derived from all variant classes, we identify previously unrecognized co-occurring mutational footprints, which we termed mutational scenarios. We demonstrate that clinical neuroblastoma heterogeneity is linked to differences in the processes driving these mutational scenarios. Whereas high-risk MYCN-amplified neuroblastoma genomes were characterized by signs of replication slippage and stress, homologous recombination-associated signatures defined high-risk non-MYCN-amplified patients. Non-high-risk neuroblastomas, on the other hand, were marked by footprints of chromosome missegregation and TOP1 mutational activity. This analysis provides a systematic perspective on the repertoire of mutational patterns that contribute to clinical neuroblastoma heterogeneity.
Study
EGAS00001006983
The mutation burden of narrowband ultraviolet B phototherapy in human skin - Nanoseq
Background: Ultraviolet radiation (UV) is used as a treatment for psoriasis, but UV can also induce mutations which may lead to development of skin cancer. Information on the mutagenicity of narrowband UVB (NBUVB) would help inform clinicians and patients who are concerned about the potential risks of this treatment.
Dataset
EGAD00001015249
The mutation burden of narrowband ultraviolet B phototherapy in human skin - WGS
Background: Ultraviolet radiation (UV) is used as a treatment for psoriasis, but UV can also induce mutations which may lead to development of skin cancer. Information on the mutagenicity of narrowband UVB (NBUVB) would help inform clinicians and patients who are concerned about the potential risks of this treatment.
Dataset
EGAD00001015250
Expression profiling of Gorlin iPSCs in the osteoblast induction culture
In this study, we utilized human induced pluripotent stem cell (iPSC)-based models of the two diseases, named Gorlin syndrome and McCune Albright synfrome, to understand the roles of Hh signaling in osteogenesis, especially in mineralization. To examine which signaling pathways and genetic networks were affected by Gorlin syndrome-associated PTCH1 mutation in human osteoblastogenesis, we performed RNA sequencing (RNA-seq) analysis in the osteoblast induction culture of WT and Gorlin iPSCs.
Study
JGAS000218
Transcriptional Response to Hypoxia in iPSC-Derived Endothelial Cells from a High Altitude Adapted Population
We established lymphoblastoid cell lines from 10 adult individuals of Tibetan ancestry living in the Chicago area. These lines were reprogrammed to induced pluripotent stem cells (iPSC). We validated genetic ancestry by genome-wide SNP array genotyping and population genetic analyses. After being subjected to standard quality control testing iPSCs were differentiated into endothelial cells. Following differentiation, endothelial cells were isolated via a pull down protocol using beads coated with an antibody for a canonical surface cell marker of the vascular endothelium (CD144). Purified vascular endothelial cells were cultured in parallel in normoxia (20% O2) and hypoxia (1% O2) for 48 hours prior to harvesting and processing for bulk RNA-sequencing. We provide imputed genotype data and bulk RNA-sequencing data in normoxia and in hypoxia for each of the 10 individuals. We also provide multiplexed scRNA-sequencing data for the 10 Tibetan individuals pooled with 10 CHB individuals from the 1000 Genome Project. Additionally, we provide scRNAseq data for the cardiomyocytes derived from the same iPSC lines from the Tibetan individuals pooled together with cardiomyocytes derived from iPSC lines of 10 CHB individuals from the 1000 Genome Project. Lymphoblastoid cell lines from the CHB were reprogrammed to iPSCs and differentiated to cardiomyocytes using the same protocols as used for the Tibetans. The batches of 4 lines were balanced by sex and population. These iPSC-derived cardiomyocytes were cultured in hypoxia (1% O2) for 48 hours and subjected to scRNAseq in pooled batches of 3 or 4 lines. The CHB individuals included in this analysis were: NA18528, NA18531, NA18557, NA18596, NA18606, NA18608, NA18614, NA18619, NA18633, and NA18748.
Study
phs003758
Targeted sequencing about core genes involved in telomere biology in colorectal cancer patients
We sequenced the coding exons of core genes involved in telomere maintenance using peripheral blood DNA of 192 CRC patients. The primary sequencing data were generated by using Ion Torrent Personal Genome Machine® (PGM™) platform (Life Technologies, Carlsbad, CA, USA).
Study
EGAS00001002977