Neurological conditions account for over 6% of the global disease burden. There are more than 600 neurological disorders, and cognitive dysfunction, also referred to as intellectual disability (ID), occupies a prominent position in this list. Cognitive dysfunction arises from the failure of neuronal cells to organize into a complex network and remodel this network in response to learning and experience. It is manifested by deficits in adaptive behaviors in everyday social and practical skills. Due to its high prevalence and the lifetime cost of care per individual, in the range of $1-2 million in United Sates (CDC), it presents a significant health burden. Genetic and functional studies of the genes and protein determinants of ID have helped to elucidate the molecular pathways of human brain development in health and disease. However, the identity of a large number of essential molecular and cellular components still remain elusive. The objectives of our study are to expand the genetic repertoire of causal ID genes and characterize their role in neuronal structure and cognitive function. The rationale is that identification and functional elucidation of causative gene variants that lead to cognitive dysfunction will be essential for understanding brain function and for developing improved diagnostic tools and efficacious preventive and therapeutic agents for neurological disorders, including ID. There are 3 aims of our comprehensive research program: 1) Ascertain and clinically phenotype members of extended families segregating autosomal recessive intellectual disability (ARID); 2) identify new ARID genes and gene products; and 3) determine the functions of prioritized novel ARID genes using multifaceted approaches, including analysis of spatiotemporal expression patterns in mouse brain, targeting in cultured rat hippocampal neurons, effects on cell morphology and synapse abundance, synaptic transmission and plasticity in neuronal cells by electrophysiology and live-cell imaging assays. The project will advantageously combine human clinical assessment, genetic and functional analyses relevant to brain development and function. Impact: Execution of the proposed studies will generate new knowledge that is clinically relevant, with high potential to impact ID molecular diagnosis, prognoses, and identify novel therapeutic targets to slow progression, delay onset, and possibly devising precision medicine approaches for ID.
Ewing sarcoma (EWS) is a deadly bone cancer that occurs in children and adolescents. Mounting evidence suggests that a genetic predisposition exists for this pediatric cancer, although the specific genetic contribution has yet to be identified. EWS has never been linked to a specific cancer predisposition syndrome, although several case reports have been published that describe siblings and cousins with EWS. Furthermore, neuroectodermal tumors appear to occur more commonly in families with EWS. The two consistent epidemiology findings in EWS include a very strong Caucasian predilection and increased rates of hernia in EWS patients and their family members. Finally, the role of genetic microsatellite repeats in EWS tumorigenesis has been recently described, and these GGAA microsatellites are polymorphic in repeat size and location across the genome. The study goals of this Kids First project include (1) To identify cancer predisposition genes in EWS trios increasing disease risk, (2) To identify genome-wide GGAA microsatellite repeats in EWS trios increasing disease risk, and (3) To identity de novo mutation and structural variant rates in EWS trios reflecting underlying DNA repair defects that increase disease risk. As part of the Kids First Common Fund initiative, this study proposal will further elucidate the genetic contribution to pediatric cancer development. Around 375 of these trios were selected for whole genome sequencing as part of the Gabriella Miller Kids First fund. The EWS trios have been collected as part of the Children's Oncology Group's AEPI10N5 Study ("Genetic Epidemiology of Ewing Sarcoma"), and each trio has associated phenotypic data including a detailed family history. We will interrogate the sequence data using our genomic analysis pipeline at the University of Utah and the Utah Science Technology and Research initiative's (USTAR) Center for Genetic Discovery. We will look for the genetic contribution to EWS and the sequence data with be shared in a repository designated by the Kids First Common Fund. The WGS of these ~375 EWS trios will help us to understand the genetic origins of a deadly childhood cancer and may lead to novel strategies for prevention and treatment.
This project aims to identify genetic modifiers of congenital heart disease (CHD), which occurs among 22q11.2 deletion syndrome (22q11.2DS; also known as DiGeorge syndrome/velo-cardio-facial syndrome) patients. In contrast to the prevalence of ~0.8% in the general population, about 60%-70% of 22q11.2DS patients have CHD, of which most are conotruncal heart defects, making 22q11.2DS a very ideal model to study the genetic underpins of CHD. The chromosome 22q11.2 region is susceptible to meiotic chromosome rearrangements leading to hemizygous deletions, because there are large blocks of low copy repeats, termed LCR22s, that are dispersed in the region. Non-allelic homologous recombination during meiosis can lead to specific-sized deletions. Approximately 90% have a typical hemizygous 3 million base pair (Mb) deletion between LCR22-A and -D, while some have nested deletions between LCR22-A and B or LCR22-A and C. Our goal is to perform a case-control association study, in which cases have CHD, while controls have normal heart structures, all with 22q11.2DS. All individuals are unrelated and have been collected from collaborators in the US, Canada and Europe. All subjects received a clinical diagnosis of 22q11.2DS and signed informed consent to allow a blood sample to be collected for research. The DNA samples were de-identified in 2 steps to protect their privacy. We then determined deletion type using multiplex ligation-dependent probe amplification (MLPA). This information is provided for the great majority of subjects. We also obtained cardiac phenotype information from echocardiogram reports as well as reported race and ethnicity (Hispanic vs. non-Hispanic information), which is required for the analysis. We will obtain 30X whole-genome sequence data from NIH CIDR. We will first annotate the sequence variants identified to uncover rare ( For genotype data, we will provide the raw fastq data from whole-genome sequencing as well as the called variants file in a VCF file. For phenotype data, we will provide the sample relationship files (ped files), CHD affection status, deletion type, and race.
The Epi4K project began in 2011 as an international, multi-center study that seeks to identify and characterize the genetic bases of complex epilepsies. The Center without Walls Epi4K project includes three cores and four scientific projects, as well as a steering committee comprised of the primary study investigators and representatives from the National Institute of Neurological Disorders and Stroke (NINDS). The three cores include: (1) The Administrative Core which handles the overall coordination of Epi4K activities; (2) The Sequencing, Biostatistics and Bioinformatics Core which is responsible for generating the next-generation sequence data, inferring the genetic variation in each of the study participants, and performing the primary analyses to identify epilepsy genes; and (3) The Phenotyping and Clinical Informatics Core which verifies and archives all the phenotypic data from each study participant. The proposed number of patients to be sequenced and analyzed in the scientific projects is a minimum of 4,000; thus the Center was named "Epi4K: Gene Discovery in 4,000 Genomes". Project 1 addresses the genetics of rare and severe childhood epilepsies, including epileptic encephalopathies (infantile spasms and Lennox-Gastaut syndrome), and malformations of cortical development (periventricular nodular heterotopia and polymicrogyria). Exome and genome sequence data generated from DNA collected from patients will be screened for mutations (single nucleotide substitutions, small insertion-deletions, and copy number variations) that cause or contribute to the diseases. Project 2 is focused on genetic discovery in multiplex families. This study will use next-generation sequencing to identify genomic variation that influences risk for common subtypes of epilepsy including idiopathic generalized epilepsy and nonlesional focal epilepsy. Project 3 seeks to identify genetic determinants of prognosis in patients with a range of epilepsy disorders. This study will study established epilepsy cohorts with well-characterized data on seizure outcome to look for relationships between genetic variation and pharmacological control of seizures. Project 4 will use next-generation sequencing data (exome and genome) to screen for epilepsy-associated copy number variation across all Epi4K projects using novel computational algorithms.
Prostate cancer is the most frequent malignant tumor in males and the second most frequent cause of cancer-related death. Currently, in Germany, more than 60,000 prostate cancers are diagnosed every year. Although most of these patients are treated in a curative attempt, more than 10,000 German men die from prostate cancer annually. Owing to the demographic changes of our society, a further doubling of prostate cancer incidences during the next 20 years is expected. Prostate cancer is generally considered a tumor of elderly men. However, a fraction of prostate cancers are diagnosed at the age of 55 years or less. For several reasons, these “early onset prostate cancers” may represent a key entity for the understanding of prostate cancer biology. First, it is likely that early onset prostate cancers represent a distinct molecular subgroup of prostate cancer (PCa), potentially characterized by relatively small numbers of genetic changes, some of which may be particularly strong driver mutations for PCa development. Second, a fraction of prostate cancers in young individuals could represent classical prostate cancers that are detected at a very early stage and might therefore have accumulated molecular changes/mutations occurring that are most instrumental for prostate cancer early detection. Third, PCa with hereditary backgrounds are likely to accumulate in the age group below 55 years. A comparison with other sample sets (e.g. from other ICGC consortia), a systematic genomic analysis of young men with PCa could therefore lead to the detection of mechanisms for hereditary PCa. Fourth, a better understanding of these tumors is particularly relevant as finding optimal treatment regimens is most critical in young cancer patients. We are analyzing the entire genomic DNA sequences of at least 200 PCas (and matched non-tumorous DNA) of young men (≤ 55y), to at least 30 fold coverage, and integrate single-nucleotide variants and genomic structural variations with differential methylation, mRNA and miRNA expression data.Tissues were collected at the University Medical Center Hamburg-Eppendorf. Sequencing was performed at DKFZ and NCT (Heidelberg), EMBL (Heidelberg), and MPIMG (Berlin). Data management and bioinformatic data analyses is being conducted at DKFZ.
Cancer in children is uncommon and the overall prognosis for most pediatric cancers is good. However, while the combined survival rates have improved over the last decades, certain childhood malignancies, such as high-grade gliomas or metastatic sarcomas, still remain incurable in most patients. New strategies for targeting those devastating diseases are imperative, and patient genomic data can become a key asset in the process [1]. The Pediatric Cancer Genome Project’s datasets At EGA, we store over 500 datasets with sequencing data from pediatric cancer patients. A remarkable case is the datasets belonging to the Pediatric Cancer Genome Project (PCGP) from St. Jude Children’s Research Hospital–Washington University (Table 1 - EGAS list). PCGP is an ambitious effort to identify the mutations that drive childhood cancer and find new cures. Those datasets include 600 patients with complete tumor and normal genomes from 15 different tumor types. PCGP datasets have been an important resource for various studies that pooled genomic data from different public and in-house datasets to perform extensive genomic characterizations and publish comprehensive pan-cancer pediatric studies [2],[3]. Other interesting pediatric cancer datasets are from the Pediatric Brain Tumor Consortium, Sickkids or ICGC PedBrain project. How data reuse impacts drug development for pediatric cancers Recently, in October 2024, Nature Communications published a work led by the University of Michigan where authors used data from EGA (PCGP) check the table below as part of their strategy to pursue the identification of new tumor vulnerabilities susceptible to becoming novel therapeutic targets in diffuse midline glioma. Diffuse midline gliomas (DMG) are treatment-resistant and uniformly fatal pediatric brain tumors. The prognosis of this brainstem tumor is dismal with a median overall survival of 9–12 months from diagnosis. In this study, the authors first developed a lab model of the disease and reanalyzed data from the EGA to identify genes potentially linked to tumor severity. This approach revealed involvement of specific metabolic pathways, suggesting new possibilities for therapeutic intervention. In mice, the study shows that the use of statins can improve survival. For a brief overview of the underlying science, diffuse midline gliomas present intratumor heterogeneity with subpopulations of less-differentiated oligodendrocyte precursors and more differentiated astrocytes. Authors established in vitro models to recapitulate both phenotypes and identify metabolic programs in both subpopulations. To determine the clinical relevance, the authors re-used gene expression data from 76 DMG patients identifying a gene signature predicting decreased overall survival. After extensive metabolic characterization of subpopulations, authors defined strategies to target specific metabolic vulnerabilities. Pre-clinical experiments in mice using OXPHOS inhibitors and statins showed a reduction in tumor burden and increased overall survival [4]. In this scenario, the re-use of high-quality patient’s sequencing data appeared as an advantageous strategy to support the clinical relevance of in-vitro and pre-clinical findings. On the other hand, in rare conditions such as childhood cancer, the availability of public datasets and the possibility of pooling data from different sources can be the only way to obtain impactful results that lead to improvements for patients. References Davidoff, A. M. Pediatric Oncology. Seminars in pediatric surgery 19, 225–233 (2010). Gröbner, S. N. et al. The landscape of genomic alterations across childhood cancers. Nature 555, 321–327 (2018). Venu Thatikonda et al. Comprehensive analysis of mutational signatures reveals distinct patterns and molecular processes across 27 pediatric cancers. Nature Cancer 4, 276–289 (2023). Mbah, N. E. et al. Therapeutic targeting of differentiation-state dependent metabolic vulnerabilities in diffuse midline glioma. Nature Communications 15, (2024). Datasets and Studies ID Title Access Policy Year EGAD00001000134 Driver mutations in histone H3.3 and chromatin remodelling genes in paediatric glioblastoma McGill-DKFZ Pediatric Brain Tumour Consortium 2016 EGAS0000100192 Somatic Histone H3 Mutations in Diffuse Intrinsic Pontine Gliomas and Non-Brainstem Paediatric Glioblastomas St. Jude Children's Research Hospital - Washington University Pediatric Cancer Genome Project 2017 EGAS00001000575 Whole genome sequencing and whole exome sequencing of DIPG tumors and matched normal tissue The hospital for sick children ("SickKids") 2016 EGAD00001000792 Exome sequencing reads of paediatric glioblastoma McGill-DKFZ Pediatric Brain Tumour Consortium 2016 EGAD00001002006 Whole genome sequencing of paediatric glioblastoma in the ICGC PedBrain project ICGC PedBrain project 2016
T cell activation is known to require a metabolic shift towards glycolysis, triggered by TCR-engagement . To substantiate the role of glycolysis in transcriptional reprogramming during antigen-driven T cell activation, we utilized a well-established ex-vivo model to recapitulate the process of antigen stimulation without the requirement for additional antigen-presenting cells . We initially explored and validated the conditions for both metabolic and epigenetic reprogramming. Antigen-mediated CD4+ T cells activation initiates context-specific gene-expression programs that drive effector functions and cell fates through changes in the epigenetic landscape . To evaluate global changes in histone modifications, we isolated CD4+ T cells from peripheral blood (PB) of healthy control (HC) human donors and either maintained them in resting conditions or stimulated with anti-CD3/CD28 for 24h. Rapid changes in the CD4+ epigenome and transcriptome were observed within 24h as measured by RNA-sequencing (RNA-seq) and H3K27ac chromatin immunoprecipitation followed by DNA-sequencing (ChIP-seq).To explore the requirement for metabolic reprogramming in regulating transcriptional responses after initial TCR engagement, CD4+ T cells from PB of healthy donors were isolated and activated in either the presence or absence of oligomycin (inhibitor of ATP synthase blocking the production of ATP by OXPHOS), 2-deoxyglucose (2DG; competitively inhibiting production of glucose-6-phosphate from glucose), or replacement of glucose with galactose. Galactose is metabolized through glycolysis, utilizing the Leloir pathway which is energy neutral, forcing cells to use all the pyruvate production to sustain the TCA cycle for OXPHOS and ATP synthesis at the expense of lactate production. Employing ChIP-seq and RNA-seq, inhibition of OXPHOS-mediated ATP production by oligomycin had only a limited effect on H3K27-acetylation or activation-induced transcription during the first 24h of activation. In contrast, inhibition of glycolysis by 2DG treatment compromised both H3K27-acetylation and transcriptional changes. However, under conditions of galactose-metabolism where lactate production is abrogated, activated CD4+ T cells were still able to globally remodel H3K27 acetylation and reprogram the transcriptome.
In the UK10K project we propose a series of complementary genetic approaches to find new low frequency/rare variants contributing to disease phenotypes. These will be based on obtaining the genome wide sequence of 4000 samples from the TwinsUK and ALSPAC cohorts (at 6x sequence coverage), and the exome sequence (protein coding regions and related conserved sequence) of 6000 samples selected for extreme phenotypes. Our studies will focus primarily on cardiovascular-related quantitative traits, obesity and related metabolic traits, neurodevelopmental disorders and a limited number of extreme clinical phenotypes that will provide proof-of-concept for future familial trait sequencing. We will analyse directly quantitative traits in the cohorts and the selected traits in the extreme samples, and also use imputation down to 0.1% allele frequency to extend the analyses to further sample sets with genome wide genotype data. In each case we will investigate indels and larger structural variants as well as SNPs, and use statistical methods that combine rare variants in a locus or pathway as well as single-variant approaches. This sample set consists of DNA from multiply affected schizophrenia families. The families have been diagnosed using the SADS-L clinical instrument which gives diagnoses at the probable level of the research diagnostic criteria (RDC). In addition all diagnoses are available using DSMIIIR criteria. These criteria are widely accepted as being valid and reliable for the diagnosis of schizophrenia. All families have been collected to ensure that they are uni-lineal for transmission of schizophrenia, i.e. they have only one affected parent with schizophrenia, or a relative of only one transmitting or obligate carrier parent with schizophrenia. Families with bi-lineal transmission of schizophrenia (i.e. with both parents being affected) were not sampled for this study. All families have multiple cases of schizophrenia and related disorders. The families have been selected to ensure there are no cases of bipolar disorder within them and that they do not contain bipolar disorder in any relatives on either side of the family.For further information on this cohort please contact Hugh Gurling (h.gurling@ucl.ac.uk).
Cancer genomics has illuminated a wide spectrum of genes and core molecular processes contributing to human malignancy. Still, the genetic and molecular basis of many cancers remains only partially explained. Genetic predisposition accounts for 5-10% of cancer diagnoses and genetic events cooperating with known somatic driver events are poorly understood. Analyzing established cancer predisposition genes in medulloblastoma (MB), a malignant childhood brain tumor, we recently identified pathogenic germline variants that account for 5% of all MB patients. Here, by extending our previous analysis to include all protein-coding genes, we discovered and replicated rare germline loss-of-function (LoF) variants across Elongator Complex Protein 1 (ELP1) on 9q31.3 in 15% of pediatric MBSHH cases, thus implicating ELP1 as the most common MB predisposition gene and increasing genetic predisposition to 40% for pediatric MBSHH. Inheritance was verified based on parent-offspring and pedigree analysis, which identified two families with a history of pediatric MB. ELP1-associated MBs were restricted to the molecular SHHa subtype and were characterized by universal biallelic inactivation of ELP1 due to somatic loss of chromosome 9q. The majority of ELP1-associated MBs exhibited co-occurring somatic PTCH1 (9q22.32) alterations, suggesting that ELP1-deficiency predisposes to tumor development in combination with constitutive activation of SHH signaling. ELP1 is an essential subunit of the evolutionary conserved Elongator complex, whose primary function is to enable efficient translational elongation through tRNAs modifications at the wobble (U34) position. Biochemical, transcriptional, and proteomic analyses revealed that ELP1-associated MBSHH are characterized by a destabilized core Elongator complex, loss of Elongator-dependent tRNA modifications, codon-dependent translational reprogramming, and induction of the unfolded protein response (UPR), consistent with deregulation of protein homeostasis due to Elongator-deficiency in model systems. Our findings suggest that genetic predisposition to proteome instability is a previously underappreciated determinant in the pathogenesis of pediatric brain cancer. These results provide strong rationale for further investigating the role of protein homeostasis in other pediatric and adult cancer types and potential opportunities for novel therapeutic interference.
Metadata Distribution Welcome to the realm of Metadata Distribution within the EGA ecosystem! Our Metadata REST API empowers you to effortlessly retrieve metadata from the expansive landscape of EGA. By utilising this API, you gain access to publicly available insights across various EGA domains, including studies, samples, experiments, runs, analyses, policies, DACs, and datasets. Furthermore, this API facilitates cross-referencing of objects, enabling you to gather, for example, all the datasets associated with a specific DAC, seamlessly. In addition, we have added the ability to query private data using the metadata API. If you possess the necessary permissions, you can access behind-the-login private data for a specified list of datasets. Metadata Distribution Index Identifiers Dataset Mappings Website Download Metadata API - Private Identifiers At the core of EGA's organisational structure are unique accessions that serve as essential tags for our diverse objects. Here's a quick overview of the accessions and their corresponding object types: EGA Accession ID EGA Object description EGAS EGA Study Accession ID EGAC EGA DAC Accession ID EGAP EGA Policy Accession ID EGAN EGA Sample Accession ID EGAR EGA Run Accession ID EGAX EGA Experiment ID EGAZ EGA Analysis Accession ID EGAD EGA Dataset Accession ID EGAB EGA Submission ID EGAF EGA File Unique Accession ID For further information check our metadata schema documentation. Dataset Mappings For authorised datasets, comprehensive mappings reveal meaningful connections: Sample_file: This file presents information about the linkage between samples and files available in the dataset. Study_experiment_run_sample: This file presents information about the linkage between studies, experiments, runs, and samples within the dataset. Study_analysis_sample: This file presents information about the linkage between studies, analyses, and samples contained within the dataset. Run_sample: This file presents information about the linkage between runs and samples within the dataset. Analysis_sample: This file presents information about the linkage between analyses and samples within the dataset. An empty file indicates the absence of corresponding information. Website Download Our website serves as your gateway to downloading metadata. Simply navigate to the dataset page, and you'll find a blue Metadata button. Once authenticated, you can click this button for authorised datasets. If you lack permissions for a particular dataset, request access by clicking the 'Request Access' button. For authorised datasets, choose your preferred metadata format: CSV, TSV, or JSON. Metadata API - Private Leverage the power of programmatic metadata downloads! Start by authenticating yourself with your credentials to obtain an access token. With this token, programmatically query private information. Queries mirror the structure of the Public Metadata API. However, behind the login, you can delve into specific mapping information (as mentioned above in dataset mappings) alongside object-level exploration. Authentication An active session is required to work with the API. Each time you log in with your credentials a new session is started, which is identified by an access_token. Below an example on how to obtain one using curl: curl https://idp.ega-archive.org/realms/EGA/protocol/openid-connect/token \ -d 'client_id=metadata-api' \ -d 'username=...' \ --data-urlencode 'password=...' \ -d 'grant_type=password' All responses from the API are in JSON format. A successful response should include a new token to be used for the session: {"access_token":"eyJhbGciOiJSUzI1NiIsInR5cCIgOiA...TNw", "expires_in":300, "refresh_expires_in":1800, "refresh_token":"eyJhbGciOiJIUzI1NiIsInR5cCIgOiA...pTX10", "token_type":"Bearer", ... } Save the access_token value and include it in the API call headers. Example query usage Below you can find some example of queries available behing authentication and authorisation. Querying study-experiment-run-sample mappings: curl https://metadata.ega-archive.org/datasets/{datasetID}/mappings/study_experiment_run_sample \ -H 'Authorization: Bearer access_token' Querying run-sample mappings: curl https://metadata.ega-archive.org/datasets/{datasetID}/mappings/run_sample \ -H 'Authorization: Bearer access_token' Querying study-analysis-sample mappings: curl https://metadata.ega-archive.org/datasets/{datasetID}/mappings/study_analysis_sample \ -H 'Authorization: Bearer access_token' Querying analysis-sample mappings: curl https://metadata.ega-archive.org/datasets/{datasetID}/mappings/analysis_sample \ -H 'Authorization: Bearer access_token' Querying sample-file mappings: curl https://metadata.ega-archive.org/datasets/{datasetID}/mappings/sample_file \ -H 'Authorization: Bearer access_token' You can get a different output format by adding one of these options to the curl command: -H 'Accept: text/tsv' -H 'Accept: application/json' -H 'Accept: text/csv' For more detailed information, refer to the Metadata API Specification.
Background Monocytes are key mediators of innate immunity to infection, undergoing profound and dynamic changes in epigenetic state and immune function which are broadly protective but may be dysregulated in disease. Here, we aimed to advance understanding of epigenetic regulation following innate immune activation, acutely and in endotoxin tolerant states. Methods We exposed human primary monocytes from healthy donors (n=6) to interferon-γ or differing combinations of endotoxin (lipopolysaccharide), including acute response (2hr) and two models of endotoxin tolerance: repeated stimulations (6+6hr) and prolonged exposure to endotoxin (24hr). Another subset of monocytes was left untreated (naïve). We identified context-specific regulatory elements based on epigenetic signatures for chromatin accessibility (ATAC-seq) and regulatory non-coding RNAs from total RNA sequencing. Results We present an atlas of differential gene expression for endotoxin and interferon response, identifying widespread context specific changes. Across assayed states, only 24-29% of genes showing differential exon usage are also differential at the gene level. Overall, 19.9% (6884 of 34616) of repeatedly observed ATAC peaks were differential in at least one condition, the majority upregulated on stimulation and located in distal regions (64.1% vs 45.9% of non-differential peaks) within which sequences were less conserved than non-differential peaks. We identified enhancer-derived RNA signatures specific to different monocyte states that correlated with chromatin accessibility changes. The endotoxin tolerance models showed distinct chromatin accessibility and transcriptomic signatures, with integrated analysis identifying genes and pathways involved in the inflammatory response, detoxification, metabolism and wound healing. We leveraged eQTL mapping for the same monocyte activation states to link potential enhancers with specific genes, identifying 1,946 unique differential ATAC peaks with 1,340 expression associated genes. We further use this to inform understanding of reported GWAS, for example involving FCHO1 and coronary artery disease. Conclusion This study reports context-specific regulatory elements based on transcriptomic profiling and epigenetic signatures for enhancer-derived RNAs and chromatin accessibility in immune tolerant monocyte states, and demonstrates the informativeness of linking such elements and eQTL to inform future mechanistic studies aimed at defining therapeutic targets of immunosuppression and diseases.
The Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) is a collaborative effort comprised of a coordinating center and scientific researchers from well-characterized cohort and case-control studies conducted in North America and Europe. This international consortium aims to accelerate the discovery of common and rare genetic risk variants for colorectal cancer by conducting large-scale meta-analyses of existing and newly generated genome-wide association study (GWAS) data, replicating and fine-mapping of GWAS discoveries, and investigating how genetic risk variants are modified by environmental risk factors. To expand these efforts, we assembled case-control sets or nested case-control sets from 20 different North American or European studies. Summary descriptions and study participant inclusions/exclusion criteria for each of these studies are detailed below. The Black Women's Health Study (BWHS): Is the largest follow-up study of the health of African-American women (Cozier et al., 2004; Rosenberg et al., 1995) [PMID: 15018884; PMID: 7722208]. The purpose is to identify and evaluate causes and preventives of cancers and other serious illnesses in African-American women. Among the diseases being studied are breast cancer, colorectal cancer, type 2 diabetes, uterine fibroids, systemic lupus erythematosus, and cardiovascular disease. The study began in 1995, when 59,000 black women from all parts of the United States enrolled through postal questionnaires. The women provided demographic and health data on the 1995 baseline questionnaire, including information on weight, height, smoking, drinking, contraceptive use, use of other selected medications, illnesses, reproductive history, physical activity, diet, use of health care, and other factors. The participants are followed through biennial questionnaires to determine the occurrence of cancers and other illnesses and to update information on risk factors. Self-reports of cancer are confirmed through medical records and state cancer registry records. Mouthwash-swish samples, as a source of DNA, were obtained from ~26,000 BWHS participants in 2002-2007. DNA was isolated from the mouthwash-swish samples at the Boston University Molecular Core Genetics Laboratory using the QIAAMP DNA Mini Kit (Qiagen). All incident colorectal cancer cases with a DNA sample were included in the present analysis. Two controls per case, selected from among BWHS participants free of colorectal cancer at end of follow-up, were matched to cases on year of birth (+/- 2 years) and geographical region of residence (Northeast, South, Midwest, and West). A total 209 colorectal cancer cases and 423 controls were sent for genotyping. Campaign Against Cancer and Heart Disease (CLUE II): The Campaign Against Cancer and Heart Disease, is a prospective cohort designed to identify biomarkers and other factors associated with risk of cancer, heart disease, and other conditions (Kakourou et al., 2015) [PMID: 26220152]. 32,894 participants were recruited from May through October 1989 from Washington County, Maryland and surrounding communities. Colorectal cancer cases (n = 297) and matched controls (n = 296) were identified between 1989 and 2000 among participants in the CLUE II cohort of Washington County, Maryland. Colorectal Cancer Study of Austria (CORSA): In the ongoing colorectal cancer study of Austria (CORSA), more than 13,000 Caucasian participants have been recruited within the province-wide screening project "Burgenland Prevention Trial of Colorectal Disease with Immunological Testing" (B-PREDICT) since 2003 (Hofer et al., 2011) [PMID: 21422235]. All inhabitants of the Austrian province Burgenland aged between 40 and 80 years are annually invited to participate in fecal immunochemical testing and haemoccult positive screening participants are invited for colonoscopy. CORSA includes genomic DNA and plasma of colorectal cancer cases, low-risk and high-risk adenomas, and colonoscopy-negative controls. Controls received a complete colonoscopy and were free of colorectal cancer or polyps. CORSA participants have been recruited in the four KRAGES hospitals in Burgenland, Austria, and additionally, at the Medical University of Vienna (Department of Surgery), the Viennese hospitals "Rudolfstiftung" and the "Sozialmedizinisches Zentrum Sud", and at the Medical University of Graz (Department of Internal Medicine). 1403 colorectal cancer and advanced colorectal adenoma cases, and 1404 matched controls were selected for the study. Distribution of factors sex and age (5 year strata) were evenly matched between cases and controls. Cancer Prevention Study II (CPS II): The CPS II Nutrition cohort is a prospective study of cancer incidence and mortality in the United States, established in 1992 and described in detail elsewhere (Calle et al., 2002; Campbell et al., 2014) [PMID: 12015775; PMID: 25472679]. At enrollment, participants completed a mailed self-administered questionnaire including information on demographic, medical, diet, and lifestyle factors. Follow-up questionnaires to update exposure information and to ascertain newly diagnosed cancers were sent biennially starting in 1997. Reported cancers were verified through medical records, state cancer registry linkage, or death certificates. The Emory University Institutional Review Board approves all aspects of the CPS II Nutrition Cohort. A total of 360 cases and 359 controls were selected for this study. Czech Republic Colorectal Cancer Study (Czech Republic CCS): Cases with positive colonoscopy results for malignancy, confirmed by histology as colon or rectal carcinomas, were recruited between September 2003 and May 2012 in several oncological departments in the Czech Republic (Prague, Pilsen, Benesov, Brno, Liberec, Ples, Pribram, Usti and Labem, and Zlin). Two control groups, sampled at the same time of cases recruitment, were included in the study. The first group consisted of hospital-based individuals with a negative colonoscopy result for malignancy or idiopathic bowel diseases. The reasons for the colonoscopy were: i) positive fecal occult blood test, ii) hemorrhoids, iii) abdominal pain of unknown origin, and iv) macroscopic bleeding. The second control group consisted of healthy blood donor volunteers from a blood donor center in Prague. All individuals were subjected to standard examinations to verify the health status for blood donation and were cancer-free at the time of the sampling. Details of CRC cases and controls have been reported previously (Vymetalkova et al., 2014; Naccarati et al., 2016; Vymetalkova et al., 2016) [PMID: 24755277; PMID: 26735576; PMID: 27803053]. All subjects were informed and provided written consent to participate in the study. They approved the use of their biological samples for genetic analyses, according to the Declaration of Helsinki. The design of the study was approved by the Ethics Committee of the Institute of Experimental Medicine, Prague, Czech Republic. All subjects included in the study were Caucasians and comprised 1792 cases and 1764 matched controls. Controls were matched to CRC cases as 1:1 ratio. Matching was done on age and sex. Age was matched on +-5 years, whereas sex was matched exactly. For the cases without matched controls, matching was done only on sex. Early Detection Research Network (EDRN): The aim of the EDRN initiative is to develop and sustain a biorepository for support of translational research (Amin et al., 2010) [PMID: 21031013]. High-quality biospecimens were accrued and annotated with pertinent clinical, epidemiologic, molecular and genomic information. A user-friendly annotation tool and query tool was developed for this purpose. The various components of this annotation tool include: CDEs are developed from the College of American Pathologists (CAP) Cancer Checklists and North American Association of Central Cancer Registries (NAACR) standards. The CDEs provides semantic and syntactic interoperability of the data sets by describing them in the form of metadata or data descriptor. A total of 352 colorectal case samples and 399 controls were selected for this study. Controls were matched to CRC cases based on age and sex. The EPICOLON Consortium (EPICOLON): The EPICOLON Consortium comprises a prospective, multicentre and population-based epidemiology survey of the incidence and features of CRC in the Spanish population (Fernandez-Rozadilla et al., 2013) [PMID: 23350875]. Cases were selected as patients with de novo histologically confirmed diagnosis of colorectal adenocarcinoma. Patients with familial adenomatous polyposis, Lynch syndrome or inflammatory bowel disease-related CRC, and cases where patients or family refused to participate in the study were excluded. Hospital-based controls were recruited through the blood collection unit of each hospital, together with cases. All of the controls were confirmed to have no history of cancer or other neoplasm and no reported family history of CRC. Controls were randomly selected and matched with cases for hospital, sex and age (+- 5 years). A total of 370 cases and 370 controls were selected for genotyping. Hawaii Adenoma Study: For this adenoma study, two flexible-sigmoidoscopy screening clinics were first used to recruit participants on Oahu, Hawaii. Adenoma cases were identified either from the baseline examination at the Hawaii site of the Prostate Lung Colorectal and Ovarian cancer screening trial during 1996-2000 or at the Kaiser Permanente Hawaii's Gastroenterology Screening Clinic during 1995-2007. In addition, starting in 2002 and up to 2007, we also approached for recruitment all eligible patients who underwent a colonoscopy in the Kaiser Permanente Hawaii Gastroenterology Department. Cases were patients with histologically confirmed first-time adenoma(s) of the colorectum and were of Japanese, Caucasian or Hawaiian race/ethnicity. Controls were selected among patients with a normal colorectum and were individually matched to the cases on age at exam, sex, race/ethnicity, screening date (+-3 months) and clinic and type of examination (colonoscopy or flexible sigmoidoscopy). We recruited 1016 adenoma cases (67.8% of all eligible) and 1355 controls (69.2% of all eligible); 889 cases and 1169 controls agreed to give a blood and 29 cases and 34 controls, a mouthwash sample. A total of 989 cases and 1185 controls were genotyped for this study. Columbus-area HNPCC Study (HNPCC, OSUMC): Patients with colorectal adenocarcinoma diagnosed at six participating hospitals were eligible for this study, regardless of age at diagnosis or family history of cancer. Patients with a clinical diagnosis of familial adenomatous polyposis were not eligible for this study. These six hospitals perform the vast majority of all operations for CRC in the Columbus metropolitan area (population 1.7 million). The institutional review board at all participating hospitals approved the research protocol and consent form in accordance with assurances filed with and approved by the United States Department of Health and Human Services. Briefly, during the period of January 1999 through August 2004, 1,566 eligible patients with CRC were accrued to the study (Hampel et al., 2008) [PMID 18809606]. A total of 1472 colorectal cancer samples had enough blood DNA remaining to be sent for genotyping. Control samples were provided by the Ohio State University Medical Center%#39;s (OSUMC) Human Genetics Sample Bank. The Columbus Area Controls Sample Bank is a collection of control samples for use in human genetics research that includes both donors' anonymized biological specimens and linked phenotypic data. The data and samples are collected under the protocol "Collection and Storage of Controls for Genetics Research Studies", which is approved by the Biomedical Sciences Institutional Review Board at OSUMC. Recruitment takes place in OSUMC primary care and internal medicine clinics. If individuals agree to participate, they provide written informed consent, complete a questionnaire that includes demographic, medical and family history information, and donate a blood sample. 4-7 ml of blood is drawn into each of 3 ACD Solution A tubes and is used for genomic DNA extraction and the establishment of an EBV-transformed lymphoblastoid cell culture, cell pellet in Trizol, and plasma. Controls were matched to CRC cases as 1:1. Matching was done on age at reference time (age_ref), race, and sex. Age_ref was matched on +-5 years. Sex and race were matched exactly. For the cases without matched controls, matching was done only on sex and race with 1:1 ratio. Since controls are fewer than cases, one control is matched on 2 cases at most. Health Professionals Follow-up Study (HPFS): A parallel prospective study to the NHS (Nurses' Health Study). The HPFS cohort comprised 51,529 men aged 40-75 who, in 1986, responded to a mailed questionnaire (Rimm et al., 1990) [PMID: 2090285]. Participants provided information on health related exposures, including current and past smoking history, age, weight, height, diet, physical activity, aspirin use, and family history of colorectal cancer. Colorectal cancer and other outcomes were reported by participants or next-of-kin and were followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical record review. Information was abstracted on histology and primary location. Incident cases were defined as those occurring after the subject provided the blood sample. Prevalent cases were defined as those occurring after enrollment in the study but before the subject provided the blood sample. Follow-up evaluation has been excellent, with 94% of the men responding to date. Colorectal cancer cases were ascertained through January 1, 2008. In 1993-1995, 18,825 men in the HPFS mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 13,956 men in the HPFS who had not provided a blood sample previously mailed in a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1986, but before the subject provided either a blood or buccal sample. After excluding participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were previously constructed. In addition to colorectal cancer cases and controls, a set of adenoma cases and matched controls with available DNA from buffy coat were selected for genotyping. Over the follow-up period, data were collected on endoscopic screening practices and, if individuals had been diagnosed with a polyp, the polyps were confirmed to be adenomatous by medical record review. Adenoma cases were ascertained through January 1, 2008. A separate case-control set was constructed of participants diagnosed with advanced adenoma matched to control participants who underwent a lower endoscopy in the same time period and did not have an adenoma. Advanced adenoma was defined as an adenoma 1 cm or larger in diameter and/or with tubulovillous, villous, or highgrade dysplasia/carcinoma-in-situ histology. Matching criteria included year of birth (within 1 year) and month/ year of blood sampling (within 6 months), the reason for their lower endoscopy (screening, family history, or symptoms), and the time period of any prior endoscopy (within 2 years). Controls matched to cases with a distal adenoma either had a negative sigmoidoscopy or colonoscopy examination, and controls matched to cases with proximal adenoma all had a negative colonoscopy. In total, 159 advanced adenoma cases and 109 controls were selected for genotyping. Leeds Colorectal Cancer Study (LCCS): Following local ethical approval, colorectal cancer cases were recruited from 1997 until 2012 in Leeds, UK through surgical clinics. Initially, funding was provided by the UK Ministry of Agriculture, Farming and Fisheries (subsequently the Food Standards Agency) and Imperial Cancer Research Fund (subsequently Cancer Research UK). Recruitment also occurred similarly in Dundee, Perth and York between the periods of 1997 and 2001 using the same protocol and the data and samples were combined. Pathologically confirmed cases were consented at outpatient clinics, providing information on known and postulated risk factors for colorectal cancer (diet, lifestyle and family history) as well as providing a blood sample for DNA. Exclusion criteria included pre-existing diverticular disease and an inability to complete the questionnaire. The General Practitioners of cases (all UK residents have a nominated General Practitioner to whom to refer initial medical queries) and these GPs were asked to send letters to other persons on their patient list of the same gender and born within 5 years of the case. Subsequently to enhance the number of controls, we systematically invited patients from selected GP practices. Diet was assessed in cases and controls using an extensive dietary and lifestyle questionnaire modified by that produced by the European Prospective Investigation in Cancer (EPIC). The frequency that each specific food items were eaten was recorded and we also obtained average fruit and vegetable consumption as a cross-check. In total, 1591 cases and 739 controls provided a DNA sample. The North Carolina Colon Cancer Studies (NCCCS I/II): The North Carolina Colon Cancer Studies (NCCCS I- colon and NCCCS II-rectal) were population-based case-control studies conducted in 33 counties of North Carolina. Cases were identified using the rapid case ascertainment system of the North Carolina Central Cancer Registry. Patients with a first diagnosis of histologically confirmed invasive adenocarcinoma of the colon (cecum through sigmoid colon) between October 1996 and September 2000 were classified as potential cases in the NCCCS I. The NCCCS II included patients with a first diagnosis of histologically confirmed invasive adenocarcinoma of the sigmoid colon, rectosigmoid, or rectum (hereafter collectively referred to as rectal cancer) between May 2001 and September 2006. Additional eligibility requirements were: aged 40-80 years, residence in one of the 33 counties, ability to give informed consent and complete an interview, had a driver's license or identification card issued by the North Carolina Department of Motor Vehicles (if under the age of 65), and had no objections from the primary physician in regards to contacting the individual. Controls, identified and sampled during the respective study dates, were selected from two sources. Potential controls under the age of 65 were identified using the North Carolina Department of Motor Vehicles records. For those 65 years and older, records from the Center for Medicare and Medicaid Services were used. Controls were matched to cases using randomized recruitment strategies. Recruitment probabilities were done using strata of 5-year age, sex, and race groups. Dietary information was collected using a modified version of the semiquantitative food frequency questionnaire developed at the National Cancer Institute. In addition, participants were asked about vitamin and mineral supplementation, special diets, restaurant eating, sodium use, and fats used in cooking. In NCCCS I, 515 colorectal cases and 687 matched controls were sent for genotyping. In NCCCS II, 796 colorectal cases and 823 controls were sent from the NCCCS II for genotyping. Controls were matched to CRC cases as 1:1 ratio. Matching was done on age, race, and sex. Age was matched on +-5 years. Race and sex was matched exactly. For the cases without matched controls, matching was done only on sex and race. Nurses Health Study (NHS): The NHS cohort began in 1976 when 121,700 married female registered nurses age 30-55 years returned the initial questionnaire that ascertained a variety of important health-related exposures (Belanger et al., 1978) [PMID: 248266]. Since 1976, follow-up questionnaires have been mailed every 2 years. Colorectal cancer and other outcomes were reported by participants or next-of-kin and followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical-record review. Information was abstracted on histology and primary location. The rate of follow-up evaluation has been high: as a proportion of the total possible follow-up time, follow-up evaluation has been more than 92%. Colorectal cancer cases were ascertained through June 1, 2008. In 1989 -1990, 32,826 women in NHS I mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 29,684 women in NHS I who did not previously provide a blood sample mailed a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1976 but before the subject provided either a blood or buccal sample. After excluding participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were previously constructed from which DNA was isolated from either buffy coat or buccal cells for genotyping. In addition to colorectal cancer cases and controls, a set of advanced adenoma cases and matched controls with available DNA from buffy coat were selected for genotyping. Over the follow-up period, data were collected on endoscopic screening practices and, if individuals had been diagnosed with a polyp, the polyps were confirmed to be adenomatous by medical record review. Adenoma cases were ascertained through June 1, 2011. A separate case-control set was constructed of participants diagnosed with advanced adenoma matched to control participants who underwent a lower endoscopy in the same time period and did not have an adenoma. Advanced adenoma was defined as an adenoma more than 1 cm in diameter and/or with tubulovillous, villous, or high-grade dysplasia/carcinoma-in-situ histology. Matching criteria included year of birth (within 1 year) and month/year of blood sampling (within 6 months), the reason for their lower endoscopy (screening, family history, or symptoms), and the time period of any prior endoscopy (within 2 years). Controls matched to cases with a distal adenoma either had a negative sigmoidoscopy or colonoscopy examination, and controls matched to cases with proximal adenoma all had a negative colonoscopy. A total of 272 cases and 236 matched controls were sent to CIDR for the advanced adenoma case-control set. Northern Swedish Health and Disease Study (NSHDS): Comprises over 110,000 participants, including approximately one third with repeated sampling occasions, from three population-based cohorts (Dahlin et al., 2010; Myte et al., 2016) [PMID: 20197478; PMID: 27367522]. The largest is the ongoing Vasterbotten Intervention Programme, in which all residents of Vasterbotten County are invited to a health examination upon turning 30 (some years), 40, 50 and 60 years of age. Extensive measured and self-reported health and lifestyle data, as well as blood samples for central biobanking in Umea, Sweden, are collected at the health exam. Leucocyte DNA samples for 1:1-matched CRC case-control sets from the NSHDS, of which 878 samples are included in this study, have been selected for genotyping. This is in addition to 354 samples from the NSHDS previously analyzed as part of the multicenter EPIC cohort. Cancer-specific and overall survival data are available for all patients. For at least 425 patients, archival tumor tissue has been analyzed for the BRAF V600E mutation and by sequencing codon 12 and 13 for KRAS mutations, as well as for MSI screening status by immunohistochemistry and for an eight-gene CIMP panel using quantitative real-time PCR (MethyLight). Ohio Colorectal Cancer Prevention Initiative (OCCPI, OSUMC): OCCPI (ClinicalTrials.gov identifier: NCT01850654) is a population-based study of colorectal cancer patients diagnosed in one of 51 hospitals throughout the state of Ohio from January 1, 2013 through December 31, 2016. The OCCPI was created to decrease CRC incidence in Ohio by identifying patients with hereditary predisposition (statewide universal tumor screening for newly diagnosed CRC patients), increase colonoscopy compliance for first-degree relatives of CRC patients, and encourage future research through the creation of a biorepository. The 51 Ohio hospitals participating in the OCCPI were selected to represent a cross-section of clinical centers in the state based on high reported volume of CRC patients, affiliation with a high volume hospital, or interest in participation. Institutional Review Board (IRB) approval was obtained by the individual hospitals, Community Oncology Programs, or by ceding review to the OSU IRB. Written informed consent was obtained. A total of 2139 colorectal cases were genotyped. Patients were considered eligible for this study if they were age 18 or older at the time of enrollment, if they had a surgical resection (or biopsy if unresectable) in the state of Ohio demonstrating an adenocarcinoma of the colorectum from 1/1/13 - 12/31/16. Matched control samples were selected from the Ohio State University Medical Center's (OSUMC) Human Genetics Sample Bank in an identical way to the selection for the Columbus-area HNPCC Study (please refer to the description for the Columbus-area HNPCC Study). Prostate, Lung, Colorectal and Ovarian Cancer Screening Trail (PLCO): PLCO enrolled 154,934 participants (men and women, aged between 55 and 74 years) at ten centers into a large, randomized, two-arm trial to determine the effectiveness of screening to reduce cancer mortality. Sequential blood samples were collected from participants assigned to the screening arm. Participation was 93% at the baseline blood draw. In the observational (control) arm, buccal cells were collected via mail using the "swish-and-spit" protocol and participation rate was 65%. Details of this study have been previously described (Huang et al., 2016) [PMID: 27673363] and are available online (http://dcp.cancer.gov/plco). For this study 1651 advanced adenoma cases and 1392 controls were selected for genotyping. Selenium and Vitamin E Prevention Trial (SELECT): The Selenium and Vitamin E Cancer Prevention Trial (SELECT) was a double-blind, placebo controlled clinical trial which explored using selenium and vitamin E alone and in combination to prevent prostate cancer in healthy men (Lippman et al., 2009) [PMID: 19066370]. Secondary endpoints included the prevention of colorectal and lung cancers. SELECT was conducted at 427 sites and centers in the United States, Canada and Puerto Rico; 35,533 men 55 years and older (50 or older if African American) were randomized beginning August 22, 2001. Supplementation was discontinued on October 23, 2008 due to futility. 308 colorectal cancer cases and 308 matched controls were selected from the SELECT population and sent for genotyping. Screening Markers For Colorectal Disease Study and Colonoscopy and Health Study (SMS-REACH): Details on this study population were previously reported (Burnett-Hartman et al., 2014) [PMID: 24875374]. Participants were enrollees in an integrated health-care delivery system in western Washington State (Group Health Cooperative, Seattle, Washington) aged 24-79 years who underwent an index colonoscopy for any indication between 1998 and 2007 and donated a buccal-cell or blood sample for genotyping analysis. Study recruitment took place in 2 phases, with phase 1 occurring in 1998-2003 and phase 2 occurring in 2004-2007. Persons who had undergone a colonoscopy less than 1 year prior to the index colonoscopy, persons with inadequate bowel preparation for the index colonoscopy, and persons with a prior or new diagnosis of colorectal cancer, a familial colorectal cancer syndrome (such as familial adenomatous polyposis), or another colorectal disease were ineligible. Patients diagnosed with adenomas or serrated polyps and persons who were polyp-free at the index colonoscopy (controls) were systematically recruited during both phases of recruitment. Approximately 75% agreed to participate and provided written informed consent. Based on medical records, persons who agreed to participate and those who refused study participation were similar with respect to age, sex, and colorectal polyp status. Study protocols were approved by the institutional review boards of the Group Health Cooperative and the Fred Hutchinson Cancer Research Center (Seattle, Washington). A total of 575 cases and 508 matched were selected for the study. Controls were matched to CRC cases as 1:1 ratio. Matching was done on age_ref, race, and sex. Age_ref was matched on +-5 years. The Women's Health Initiative (WHI): WHI is a long-term national health study that has focused on strategies for preventing heart disease, breast and colorectal cancer, and osteoporotic fractures in postmenopausal women. The original WHI study included 161,808 postmenopausal women enrolled between 1993 and 1998. The Fred Hutchinson Cancer Research Center in Seattle, WA serves as the WHI Clinical Coordinating Center for data collection, management, and analysis of the WHI. The WHI has two major parts: a partial factorial randomized Clinical Trial (CT) and an Observational Study (OS); both were conducted at 40 Clinical Centers nationwide. The CT enrolled 68,132 postmenopausal women between the ages of 50-79 into trials testing three prevention strategies. If eligible, women could choose to enroll in one, two, or all three of the trial components. The components are: Hormone Therapy Trials (HT): This double-blind component examined the effects of combined hormones or estrogen alone on the prevention of coronary heart disease and osteoporotic fractures, and associated risk for breast cancer. Women participating in this component with an intact uterus were randomized to estrogen plus progestin (conjugated equine estrogens [CEE], 0.625 mg/d plus medroxyprogesterone acetate [MPA] 2.5 mg/d] or a matching placebo. Women with prior hysterectomy were randomized to CEE or placebo. Both trials were stopped early, in July 2002 and March 2004, respectively, based on adverse effects. All HT participants continued to be followed without intervention until close-out. Dietary Modification Trial (DM): The Dietary Modification component evaluated the effect of a low-fat and high fruit, vegetable and grain diet on the prevention of breast and colorectal cancers and coronary heart disease. Study participants were randomized to either their usual eating pattern or a low-fat dietary pattern. Calcium/Vitamin D Trial (CaD): This double-blind component began 1 to 2 years after a woman joined one or both of the other clinical trial components. It evaluated the effect of calcium and vitamin D supplementation on the prevention of osteoporotic fractures and colorectal cancer. Women in this component were randomized to calcium (1000 mg/d) and vitamin D (400 IU/d) supplements or a matching placebo. The Observational Study (OS)examines the relationship between lifestyle, environmental, medical and molecular risk factors and specific measures of health or disease outcomes. This component involves tracking the medical history and health habits of 93,676 women not participating in the CT. Recruitment for the observational study was completed in 1998 and participants were followed annually for 8 to 12 years. All centrally confirmed cases of invasive colorectal cancers, or deaths from colorectal cancer were selected as potential cases from September 30, 2015 database. Controls were participants free of colorectal cancer (invasive or in situ) as of September 30, 2015. Potential cases and controls were excluded if they (1) were non-White; (2) had history of colorectal cancers at baseline; (3) lost to follow-up after enrollment; (4) DbGAP ineligible; (5) had <1.25ug of DNA; (6) selected for WHI study M26 Phase I or II; (7) selected for WHI study AS224 and also included in the imputation project. A total of 578 cases and 104,429 controls met the eligibility criteria. Each case was matched with 1 control (1:1) that exactly met the following matching criteria: age (+-5 years), 40 randomization centers (exact), WHI date (+-3 years), CaD date (+-3 years), OS flag (exact), HRT assignments (exact), DM assignments (exact), and CaD assignments (exact). Control selection was done in a time-forward manner, selecting one control for each case from the risk set at the time of the case's event. The matching algorithm was allowed to select the closest match based on a criteria to minimize an overall distance measure (Bergstralh EJ, Kosanke JL. Computerized matching of cases to controls. Technical Report #56, Department of Health Sciences Research, Mayo Clinic, Rochester MN. April 1995). Each matching factor was given the same weight. When exact matches could not be found, the matching criteria were gradually relaxed among unmatched cases and controls until all cases had found matched controls. Using the matching criteria specified above, 559 of the 578 eligible cases found exact matches. The matching criteria was then relaxed to : Age+-5, randomization centers, WHI date +- 3 years, CaD date +- 3 years, OS flag, HRT flag, DM flag, CaD flag. 17 of the remaining 19 unmatched cases found matched controls. By matching on Age+-5, randomization centers, WHI date +- 3 years, CaD date +- 3 years, OS flag, HRT flag, the remaining 2 unmatched cases found their matches.