We performed WES of tumor DNA and non-tumoral DNA (extracted from PBMC) of 4 patients with non-muscle invasive bladder cancer, who after the tumor resection went on to receive BCG treatment as they had high-risk disease. In order to identify tumor-specific mutations that could give rise to patient-specific neo-epitopes, we performed paired WES of tumor and non-tumoral DNA for each patient. For patient UC2, 3 different tumors were resected at the same time and DNA from these 3 tumors were sent separately for WES.
Natural killer/T-cell lymphoma (NKTL) is an aggressive malignancy with a predilection for Asian, Mexican and South American populations. With the exception of Japan, it is the most common mature T-cell lymphoma in Asia. NKTL presents as extranodal disease and mostly affects the upper aerodigestive tract. Neoplastic cells are invariably infected by the Epstein-Barr virus (EBV) and characterized by a cytotoxic phenotype. The genetic landscape of NKTL has been recently unraveled by discoveries describing recurring mutations altering the JAK-STAT pathway, epigenetic modifiers, the DDX3X gene and genetic predisposition in the HLA-DPB1 gene.The genomic landscape of NKTL has been interrogated by whole-exome sequencing, targeted sequencing and single nucleotide polymorphism arrays, but has yet been studied with whole-genome sequencing (WGS). In this study, we will use a combination of WGS, targeted-capture sequencing (TCS) and RNA sequencing data to explore the genetic landscape and its relevance to immunotherapy in NKTL
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). All of the WGS and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed. The focus of this study is to identify novel risk variants for OFC in Africa and Asian OFC case-parent triads through analysis of Whole Genome Sequencing data.
Projects Jointly managed by the European Bioinformatics Institute (EMBL-EBI) in Cambridge (UK) and the Centre for Genomic Regulation (CRG) in Barcelona, the EGA provides an invaluable service to the worldwide biomedical research community. The teams leading the EGA are involved in several international partnerships and consortia in numerous scientific fields, where they contribute to ambitious projects. In addition to the project listed below, the EGA is in a long-standing partnership with the Global Alliance for Genomics and Health (GA4GH), as described on the dedicated page. On-going projects Project Duration Domain Funder Tags CANDLE | CANDLE project aims to conceptualise and advance the development of National Cancer Data Nodes (NCDNs) in European countries. These NCDNs will boost the reuse of cancer data for research, innovation and policy making, in order to improve diagnostics and treatment for cancer patients, as well as prevention and early detection. 2025-2028 Cancer Horizon Europe DOCUMENTATION EASIGEN-DS | The EASIGEN-DS project aims to conduct a design study to establish a new European Research Infrastructure on Advanced Genomics Technologies, EASIGEN. To develop an excellent scientific, technological and operational design, we will conduct landscape studies, stakeholder consultations, and community surveying. 2025-2028 Genomic and health data Horizon Europe DATA MANAGEMENT DOCUMENTATION INFRASTRUCTURE Go-IMPaCT | Go-IMPaCT will contribute sequenced genomes and provide infrastructure as part of IMPaCT-Cohort, one of the three fundamental pillars of the Precision Medicine Infrastructure associated with Science and Technology (IMPaCT) program in Spain. Along with the Genome of Europe (GoE) project, around 18.000 people will have their genomes sequenced, also contributing to Spain's commitments in 1+MG. Go-IMPaCT will fund the development of an EGA node to manage and share this genomic and phenoclinic data, laying the foundations for regional and ethnic genomic variability in Spain to be available for research purposes. The IMPaCT cohort is created with the spirit of being an open research tool, compatible with the rest of the health research ecosystem, and other international initiatives. 2025-2027 Large-scale genomics and health data; personalised medicine Instituto de Salud Carlos III ACCESS DISCOVERY INFRASTRUCTURE METADATA STANDARDS FAIR-FEGA | This project seeks to accelerate data depositions into FEGA, significantly increasing the data flow in and from FEGA nodes. It will build capacity within the FEGA nodes and increase awareness in a wide range of stakeholders, thus altogether achieving the ultimate goal of enhancing data reuse. The project will be carried out by a strategic consortium comprising seven ELIXIR nodes and two ELIXIR communities. 2025-2026 Not applicable ELIXIR ACCESS DISCOVERY DOCUMENTATION INFRASTRUCTURE METADATA STANDARDS FEGA-Connect | A consortium of six ELIXIR nodes plus the Polish FEGA node (in-kind contribution) joining forces to build a solid base to develop solutions for effective multi-omic sensitive data integration between FEGA nodes and other infrastructures and specialised Data repositories. We aim to promote a more coherent data deposition, discoverability and retrieval of multi-omics datasets, providing FAIRer data and consequently accelerating research. 2025-2026 Multi-omics data ELIXIR ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE METADATA STANDARDS IMPaCT-Data 2 | IMPaCT-Data 2 will develop a digital platform for the integration and modelling of biomedical data associated with IMPaCT (Precision Medicine Infrastructure associated with Science and Technology) projects in Spain. It will deploy a sustainable infrastructure that facilitates the integration, standardisation, interoperability and analysis of clinical, genomic, molecular and medical imaging data. This platform will be aligned with European projects such as Genome of Europe (GoE), the first project to make use of the European Genomic Data Infrastructure (GDI), and EUCAIM. IMPaCT-Data 2 will benefit from advanced Artificial Intelligence and High Computing Capacity Systems capabilities, offering robust and accessible tools for researchers from the National Health System in Spain. 2025-2026 Large-scale genomics and health data; personalised medicine Instituto de Salud Carlos III ACCESS DISCOVERY INFRASTRUCTURE METADATA STANDARDS SenSec | This project aims to establish a mechanism for orchestrating secure access to sensitive data hosted by the EGA, whether in Central EGA or any Federated Node, from Galaxy, a popular open-source, community-driven VRE (Virtual Research Environment) for bioinformatics analysis. Building on a previous prototype that enabled Galaxy users within Trusted Research Environments (TREs) to decrypt sensitive data for workflow execution without sharing private encryption keys, SenSec will expand this prototype into a comprehensive solution for secure data analysis in Galaxy, facilitating encrypted data access and transfer from FEGA/EGA repositories to designated TREs. 2025-2026 Genomic and health data; trusted research environment ELIXIR ACCESS DATA ANALYSIS ERDERA | The European Rare Disease Research Alliance (ERDERA) takes over EJPRD to deliver concrete health benefits to rare disease patients in the next decade by advancing prevention, diagnosis and treatment research. To leave no one behind, over 170 organisations championed by the European Union and member states are working hand in hand to make Europe a world leader in rare diseases research and innovation. 2024-2034 Rare diseases Horizon Europe; "La Caixa" Foundation cofunds CRG's contribution ACCESS DATA ANALYSIS DISCOVERY INFRASTRUCTURE SYNTHIA | The aim of SYNTHIA is to deliver validated, reliable tools and methods for synthetic data generation (SDG). The tools will cover multiple data types including lab results, clinical notes, genomics, imaging and m-health data. SYNTHIA also hopes to make possible the generation of longitudinal data. 2024-2029 Genomic and health data; multi-omics; AI solutions Innovative Health Initiative (IHI) DATA ANALYSIS DATA MANAGEMENT INFRASTRUCTURE GoE | The Genome of Europe initiative aims to build a European network of national genomic reference cohorts of at least 500.000 citizens. These reference cohorts will be selected to be representative of the European population. 2024-2028 Large-scale genomic and health data Horizon Europe ACCESS DISCOVERY INFRASTRUCTURE METADATA STANDARDS HEREDITARY | HEREDITARY aims to transform the way we approach disease detection, prepare treatment response, and explore medical knowledge by building a robust, interoperable, trustworthy, and secure framework that integrates multimodal health data (including genetic data) while ensuring compliance with cross-national privacy-preserving policies. 2024-2027 Neurodegenerative disorders, gut-brain interplay Horizon Europe DATA MANAGEMENT DATA ANALYSIS EOSC-ENTRUST | The mission of EOSC-ENTRUST is to create a European network of trusted research environments for sensitive data and to drive European interoperability by joint development of a common blueprint for federated data access and analysis. 2024-2026 Trusted Research Environment Horizon Europe INFRASTRUCTURE EBV-MS | "Targeting Epstein-Barr Virus Infection for Treatment and Prevention of Multiple Sclerosis". The ambitious goals of the project are to answer the questions why only a few EBV infected persons develop MS, and define the underlying mechanism of this process, as well as clarify if targeting the EBV infection can prevent MS or improve the disease course. 2023-2028 Viral-host genetics; immune response; disease modelling; disease prevention; AI/ML solutions Horizon Europe DATA MANAGEMENT DATA ANALYSIS WISDOM | WELL-BEING IMPROVEMENT THROUGH THE INTEGRATION OF HEALTHCARE AND RESEARCH DATA AND MODELS WITHOUT BORDER FOR CHRONIC IMMUNE-MEDIATED DISEASES aims to deploy novel approaches for data processing, harmonisation, management, and secure data sharing and federated access for diseases like multiple sclerosis. Using an end-user guided approach, it will facilitate responsible and critical assessment of the use of AI in healthcare. 2023-2028 Chronic immune-mediated diseases Horizon Europe DATA MANAGEMENT INFRASTRUCTURE EUCAIM | EUropean Federation for CAncer IMages is a project that will build a highly secure, federated and large-scale European cancer imaging platform, with capabilities that will greatly enhance the potential of Artificial Intelligence in oncology. 2023-2027 Cancer Digital Europe Programme (DIGITAL) DISCOVERY CONTAGIO | CONTAGIO (COhorts Network To be Activated Globally In Outbreaks) aims to create coordination mechanisms to rapidly react to infectious disease (re-)emergence in low- and middle-income countries (LMICs). 2023-2026 Infectious Diseases European Commission - Horizon Europe ACCESS DATA MANAGEMENT DISCOVERY Youth-GEMs | Youth-GEMS (Gene Environment Interactions in Mental Health TrajectorieS of Youth) will conduct research into the genetic and environmental factors of mental health in young European people. 2022-2027 Mental health European Commission - Horizon Europe DATA MANAGEMENT DISCOVERY GDI | The European Genomics Data Infrastructure project is enabling access to genomic and related phenotypic and clinical data across Europe. It is doing this by establishing a federated, sustainable and secure infrastructure to access the data. 2022-2026 Genomic and health data European Commission - Horizon Europe; "La Caixa" Foundation cofunds CRG's contribution DISCOVERY DOCUMENTATION INFRASTRUCTURE IMPaCT-T2D | The IMPaCT-T2D project aims at studying the complete genomes of a large cohort of patients with Type 2 Diabetes mellitus (T2D), using modern sequencing technologies and artificial intelligence (AI) in order to improve the stratification and pharmacological treatment in the context of precision medicine. 2022-2025 Cardiovascular and Complex Diseases Spanish Ministry of Science and Innovation; Instituto de Salud Carlos III ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE Completed projects Project Duration Domain Funder Tags EOSC4Cancer | EOSC4Cancer builds on existing projects, research outcomes and established community solutions to create the federated FAIR data, analysis and services infrastructure needed for European Cancer research programmes. 2022-2025 Cancer European Commission - Horizon Europe DISCOVERY EuCanImage | A European Cancer Image Platform Linked to Biological and Health Data for Next-Generation Artificial Intelligence and Precision Medicine in Oncology. 2020-2025 AI Solutions in Oncology European Commission - H2020 Programme; "La Caixa" Foundation cofunds CRG's contribution DATA MANAGEMENT METADATA STANDARDS GenoMed4ALL | A consortium built to empower personalised medicine in the field of haematological diseases through the use of AI and the pooling of genomic and clinical data. 2020-2025 Hematological diseases European Commission - H2020 Programme DISCOVERY METADATA STANDARDS BY-COVID | The BeYond-COVID project aims to make COVID-19 data accessible to scientists in laboratories but also to anyone who can use it, such as medical staff in hospitals or government officials. Going beyond SARS-CoV-2 data, the project will provide a framework for making data from other infectious diseases open and accessible to everyone. 2021-2024 Infectious diseases European Commission - H2020 Programme ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE IMPaCT-Data | IMPaCT-Data aims to create the infrastructure for secondary use of data from Spanish healthcare systems - electronic health records, medical imaging and genomic repositories - and contribute with the knowledge and methodology produced to the healthcare system. 2021-2024 Large-scale genomics and health dataSpanish Ministry of Science and Innovation; Instituto de Salud Carlos III ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE LaMarato | It is a project aimed at creating and developing a catalan interhospitalary network to interrogate genetic variants from thousands of genetic tests carried out in patients with rare diseases from the main catalan hospitals. 2021-2024 Genomic and health data Fundacio La Marato de TV3 (catalan foundation) DISCOVERY HealthyCloud | This consortium will contribute a Strategic Agenda towards the European Health Research and Innovation Cloud. The project will work in collaboration with a broad range of stakeholders to ensure that all voices are included and that the results are technically and ethically sound. 2021-2023 Not Applicable European Commission - H2020 Programme DOCUMENTATION B1MG | Beyond 1 Million Genomes aims to create a network of genetic and clinical data across Europe. The project provides coordination and support to the 1+ Million Genomes Initiative (1+MG). This initiative is a commitment of 24 EU countries, the UK and Norway to give cross-border access to one million sequenced genomes by 2022. 2020-2023 Not applicable European Commission - Horizon Europe DATA MANAGEMENT INFRASTRUCTURE METADATA STANDARDS ELIXIR-CONVERGE | An alliance with the goal of Connecting and aligning ELIXIR Nodes to deliver sustainable FAIR life-science data management services. 2020-2023 Data Management and Infectious Diseases European Commission - H2020 Programme DATA MANAGEMENT INFRASTRUCTURE METADATA STANDARDS IHCC | The International HundredK+ Cohorts Consortium aims to create a global platform for translational research ? informing the biological and genetic basis for disease and improving clinical care and population health. 2020-2022 Translational research NIH; The Wellcome Trust; CZI INFRASTRUCTURE METADATA STANDARDS PPCG | The Pan Prostate Cancer Group aims to harmonise and interrogate Whole Genome DNA Sequence data generated around the world from over 2000 men with prostate cancer, with associated transcriptome and methylome data to include men from different clinical categories, and ethnicities. This project is about providing breakthrough advances through analysis of a very large series of Whole Genome DNA data from prostate cancer contributed by many of the leading scientists and clinicians working in prostate cancer genomics. 2019-2024 Cancer Cancer Research UK DATA MANAGEMENT CINECA | Consortium providing a Federated solution enabling population-scale genomic and biomolecular data accessible across international borders accelerating research and improving the health of individuals resident across continents. 2019-2023 Large-scale Genomics and Health Data European Commission - H2020 Programme ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE EASI-Genomics | A project designed to provide easy access to cutting-edge DNA sequencing technologies to researchers from academia and industry, within a framework that ensures compliance with ethical and legal requirements, as well as FAIR and secure data management. 2019-2023 Next Generation Sequencing European Commission - H2020 Programme ACCESS EJP-RD | An European consortium built to create a comprehensive, sustainable ecosystem allowing a virtuous circle between research, care, and medical innovation. 2019-2023 Rare diseases European Commission - H2020 Programme ACCESS DATA MANAGEMENT DOCUMENTATION METADATA STANDARDS EOSC-Life | EOSC-Life brings together the 13 Life Science research infrastructures (LS RIs) to create an open, digital and collaborative space for biological and medical research. The project will publish 'FAIR' data and a catalogue of services provided by participating RIs for the management, storage and reuse of data in the European Open Science Cloud (EOSC). 2019-2023 Not applicable European Commission - H2020 Programme DOCUMENTATION EUCANCan | A federated network aiming at implementing a cultural, technological and legal integrated framework across Europe and Canada, to enable and facilitate the efficient sharing of cancer genomic data. 2019-2023 Cancer European Commission - H2020 Programme DATA MANAGEMENT METADATA STANDARDS The Federated EGA framework: supporting sensitive data management across the ELIXIR Nodes | This project is a direct continuation of the FHD IS with the goal to position the FEGA framework as the core infrastructure driver to support human data sharing for research. 2019-2023 Human genomic data ELIXIR INFRASTRUCTURE UK Biobank | UK Biobank is a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants. This project is to archive whole genome sequencing and other genetic data for UK Biobank participants. 2019-2023 Large-scale Genomics and Health Data The Wellcome Trust; UKRI; Amgen; AstraZeneca; GSK; Johnson & Johnson DATA MANAGEMENT INFRASTRUCTURE VEIS | The core mission of VEIS is to create an open ecosystem of technologies that will address and adapt to the requirements of the systems used to analyse and interpret -omics and clinical data in research and application environments in biomedicine. The aim of the project is to leverage the value of the EGA for both industry and society. 2019-2022 Oncology and Rare diseases Generalitat de Catalunya and European Regional Development Fund (ERDF) ACCESS DISCOVERY ELIXIR BEACON IS | This study follows on from a number of earlier activities that have established the ELIXIR Beacon Project. The main aim is to extend the Beacon protocol, developed at EGA, to become the reference ELIXIR Data Discovery product 2019-2021 Not applicable ELIXIR DISCOVERY ELIXIR FHD IS | This project coordinates the delivery of FAIR compliant metadata standards, interfaces, and reference implementation to support the federated ELIXIR network of human data resources. 2019-2021 Human genomic data ELIXIR INFRASTRUCTURE ELIXIR Rare Disease | The Rare Disease Community extends and generalises the system of access authorisation and high volume secure data transfer developed within the EGA. The goal of the Community is to create a federated infrastructure that will enable researchers to discover, access and analyse different rare disease repositories across Europe. It is doing this in partnership with other European infrastructure projects, namely RD-CONNECT, BBMRI-ERIC and E-Rare.2019-2021 Rare diseases ELIXIR INFRASTRUCTURE Solve-RD | Solve-RD - solving the unsolved rare diseases - is a research project funded by the European Commission. It echoes the ambitious goals set out by the International Rare Diseases Research Consortium (IRDiRC) to deliver diagnostic tests for most rare diseases by 2020. The current diagnostic and subsequent therapeutic management of rare diseases is still highly unsatisfactory for a large proportion of rare disease patients - the unsolved RD cases. For these unsolved rare diseases, we are unable to explain the etiology responsible for the disease phenotype, predict the individual disease risk and/or rate of disease progression, and/or quantitate the risk of relatives to develop the same disorder. 2018-2024 Rare diseases European Commission - H2020 Programme ACCESS DATA MANAGEMENT METADATA STANDARDS EuCanShare | An EU-Canada joint infrastructure for next-generation multi-Study Heart research. 2018-2022 Cardiovascular Diseases European Commission - H2020 Programme ACCESS METADATA STANDARDS
Late infantile ceroid lipofuscinosis (LINCL) is a rare, rapidly progressing lysosomal storage disease resulting from mutations in the CLN2 gene that lead to deficiency in the lysosomal protease tripeptidyl peptidase I (TPP-I). The symptoms are largely neurological with onset at 2-4 years age with progression to death at age 8-12 yrs. The rareness of the disease and the likelihood of non-uniform progression depending on genotype together limit the data available regarding the natural history of disease progression. Current neurological rating scales are used to give an overall quantitative description of the disease but the categories are generally broad and imprecise. With that background, the primary focus of this study is to use clinical rating scales and magnetic resonance imaging methods to define the natural history of LINCL and to provide objective and sensitive surrogates for neurological status and the impact of experimental treatments in children with LINCL. Together these parameters will be applicable to future clinical studies of novel therapies for LINCL and should be transferrable to other neurological lysosomal storage diseases. Subjects will be assessed at 2 visits separated by ≥1 yr. Brain morphometry, water self-diffusion coefficients and spectroscopic data will be obtained via magnetic resonance. Resulting MRI biomarkers will be compared with neurological assessment of disease severity using the Weill-Cornell LINCL rating scale.
This dbGaP Collection contains all authorized individual-level genomic datasets currently in dbGaP that are approved for General Research Use (GRU) and have no further limitations beyond those outlined in the model Data Use Certification Agreement. Access to this study will include any additional authorized individual-level GRU datasets that become available. Renewal of this study is required annually. Data Included in this Study Individual-level genomic data designated for general research use with no further use limitations or restrictions, that is, Data use does not require approval by an Institutional Review Board for secondary analyses Data have no publication embargo Data have no other use limitations (e.g., requirement for collaboration or publication; restricted for use by academic or not-for-profit organizations; restricted to health, medical, and/or biomedical research) In response to requests from the scientific community, the NIH implemented a change in the procedures for accessing individual-level GRU genomic data. Under the modified procedures, interested investigators can request GRU individual-level genomic datasets with a single application. However, please note that these data have not been harmonized. Additionally, to help expedite the processing of requests for individual-level GRU genomic datasets, the requests will be reviewed by a single, central Data Access Committee. The process for requesting these datasets is identical to the request process for non-GRU individual-level, controlled-access data, such as the expectation that investigators will abide by the NIH Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies and Data Use Certification Agreement. Effective April 24, 2018, the Data Use Limitations (DUL) for the Genes and Blood Clotting Study (GABC) (phs000304) was updated to Health/Medical/Biomedical, Not-for-profit Use Only (HMB-NPU). Additionally, the submitting institutions have requested the removal of the National Heart, Lung, and Blood Institute (NHLBI) "Grand Opportunity" Exome Sequencing Project (GO-ESP) Lung Cohorts Exome Sequencing Project: Genetic modifiers of Pseudomonas aeruginosa (Pa) lung infection acquisition in cystic fibrosis (phs000254) from the dbGaP Compilation of Individual-Level Genomic Data for General Research Use (dbGaP Compilation) (phs000688). As such, data from studies phs000304 and phs000254 are no longer available as part of the dbGaP Compilation (phs000688). Investigators must submit separate Data Access Requests (DARs) for the two studies and be approved by the NHLBI Data Access Committee.
The Sjögren's International Collaborative Clinical Alliance (SICCA) is a multisite observational cohort study that recruited a large cohort of geographically diverse participants. Enrollment of participants began in late 2004 at five (one domestic and four international) sites, in which all groups used the same protocol-directed methods to provide uniform evaluations; collect oral, ocular, and rheumatologic data; and collect specimens. The sites were located at the University of Buenos Aires, Argentina; Peking Union Medical College, Beijing, China; Rigshopitalet (formerly at Glostrup Hospital), Copenhagen, Denmark; Kanazawa Medical University, Ishikawa, Japan; King's College, London, UK (joined in 2007); and the University of California, San Francisco (UCSF). In 2009, Aravind Eye Hospital, Madurai, India; Johns Hopkins University, Baltimore, MD; and University of Pennsylvania, Philadelphia, PA were added as additional SICCA sites. UCSF is the coordinating center for SICCA. All specimens and data collected for SICCA are housed at UCSF. To facilitate the research focused on understanding the genetics of Sjögren'sSjögren's syndrome, high-density SNP genotype data and SICCA clinical information are being made available to the research community. This includes participants recruited from all SICCA research groups (RG). These participants (3,382), blood relatives (439), and unrelated healthy controls (25) had their whole blood or saliva sample (Oragene) extracted for DNA for the GWAS at the UCSF DNA Bank. Eighty eight percent of the participants are women, most are of European or Asian ancestry and the median age is 55. To assess potential batch effects when doing case-control comparisons using planned external controls, 30 DNA samples from each of three studies were genotyped with SICCA DNA samples: The Genetic Architecture of Smoking and Smoking Cessation - Collaborative Genetic Study of Nicotine Dependence (COGEND) - PI: Laura Bierut. NEI-Age-related disease study (AREDS) - Genetic Variation in Refractive Error Substudy - PI: Dwight Stambolian. Controls from the National Institute of Mental Health's (NIMH's) Human Genetics Initiative. Genome-Wide Association Study of Schizophrenia - PI: Pablo V. Gejman. Molecular Genetics of Schizophrenia - nonGAIN Sample (MGS_nonGAIN) - PI: Pablo V. Gejman. Genotyping was performed at the Johns Hopkins University Center for Inherited Disease Research (CIDR). Data quality control, dbGaP preparation and posting, and imputation to 1000 Genomes to increase SNP density was performed by the Center for Biomedical Statistics at the University of Washington. Data analysis was performed at UCSF using software PLINK, EIGENSOFT, and SNPTEST.
The Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) is a collaborative effort comprised of a coordinating center and scientific researchers from well-characterized cohort and case-control studies conducted in North America and Europe. This international consortium aims to accelerate the discovery of common and rare genetic risk variants for colorectal cancer by conducting large-scale meta-analyses of existing and newly generated genome-wide association study (GWAS) data, replicating and fine-mapping of GWAS discoveries, and investigating how genetic risk variants are modified by environmental risk factors. To expand these efforts, we assembled case-control sets or nested case-control sets from 20 different North American or European studies. Summary descriptions and study participant inclusions/exclusion criteria for each of these studies are detailed below. The Black Women's Health Study (BWHS): Is the largest follow-up study of the health of African-American women (Cozier et al., 2004; Rosenberg et al., 1995) [PMID: 15018884; PMID: 7722208]. The purpose is to identify and evaluate causes and preventives of cancers and other serious illnesses in African-American women. Among the diseases being studied are breast cancer, colorectal cancer, type 2 diabetes, uterine fibroids, systemic lupus erythematosus, and cardiovascular disease. The study began in 1995, when 59,000 black women from all parts of the United States enrolled through postal questionnaires. The women provided demographic and health data on the 1995 baseline questionnaire, including information on weight, height, smoking, drinking, contraceptive use, use of other selected medications, illnesses, reproductive history, physical activity, diet, use of health care, and other factors. The participants are followed through biennial questionnaires to determine the occurrence of cancers and other illnesses and to update information on risk factors. Self-reports of cancer are confirmed through medical records and state cancer registry records. Mouthwash-swish samples, as a source of DNA, were obtained from ~26,000 BWHS participants in 2002-2007. DNA was isolated from the mouthwash-swish samples at the Boston University Molecular Core Genetics Laboratory using the QIAAMP DNA Mini Kit (Qiagen). All incident colorectal cancer cases with a DNA sample were included in the present analysis. Two controls per case, selected from among BWHS participants free of colorectal cancer at end of follow-up, were matched to cases on year of birth (+/- 2 years) and geographical region of residence (Northeast, South, Midwest, and West). A total 209 colorectal cancer cases and 423 controls were sent for genotyping. Campaign Against Cancer and Heart Disease (CLUE II): The Campaign Against Cancer and Heart Disease, is a prospective cohort designed to identify biomarkers and other factors associated with risk of cancer, heart disease, and other conditions (Kakourou et al., 2015) [PMID: 26220152]. 32,894 participants were recruited from May through October 1989 from Washington County, Maryland and surrounding communities. Colorectal cancer cases (n = 297) and matched controls (n = 296) were identified between 1989 and 2000 among participants in the CLUE II cohort of Washington County, Maryland. Colorectal Cancer Study of Austria (CORSA): In the ongoing colorectal cancer study of Austria (CORSA), more than 13,000 Caucasian participants have been recruited within the province-wide screening project "Burgenland Prevention Trial of Colorectal Disease with Immunological Testing" (B-PREDICT) since 2003 (Hofer et al., 2011) [PMID: 21422235]. All inhabitants of the Austrian province Burgenland aged between 40 and 80 years are annually invited to participate in fecal immunochemical testing and haemoccult positive screening participants are invited for colonoscopy. CORSA includes genomic DNA and plasma of colorectal cancer cases, low-risk and high-risk adenomas, and colonoscopy-negative controls. Controls received a complete colonoscopy and were free of colorectal cancer or polyps. CORSA participants have been recruited in the four KRAGES hospitals in Burgenland, Austria, and additionally, at the Medical University of Vienna (Department of Surgery), the Viennese hospitals "Rudolfstiftung" and the "Sozialmedizinisches Zentrum Sud", and at the Medical University of Graz (Department of Internal Medicine). 1403 colorectal cancer and advanced colorectal adenoma cases, and 1404 matched controls were selected for the study. Distribution of factors sex and age (5 year strata) were evenly matched between cases and controls. Cancer Prevention Study II (CPS II): The CPS II Nutrition cohort is a prospective study of cancer incidence and mortality in the United States, established in 1992 and described in detail elsewhere (Calle et al., 2002; Campbell et al., 2014) [PMID: 12015775; PMID: 25472679]. At enrollment, participants completed a mailed self-administered questionnaire including information on demographic, medical, diet, and lifestyle factors. Follow-up questionnaires to update exposure information and to ascertain newly diagnosed cancers were sent biennially starting in 1997. Reported cancers were verified through medical records, state cancer registry linkage, or death certificates. The Emory University Institutional Review Board approves all aspects of the CPS II Nutrition Cohort. A total of 360 cases and 359 controls were selected for this study. Czech Republic Colorectal Cancer Study (Czech Republic CCS): Cases with positive colonoscopy results for malignancy, confirmed by histology as colon or rectal carcinomas, were recruited between September 2003 and May 2012 in several oncological departments in the Czech Republic (Prague, Pilsen, Benesov, Brno, Liberec, Ples, Pribram, Usti and Labem, and Zlin). Two control groups, sampled at the same time of cases recruitment, were included in the study. The first group consisted of hospital-based individuals with a negative colonoscopy result for malignancy or idiopathic bowel diseases. The reasons for the colonoscopy were: i) positive fecal occult blood test, ii) hemorrhoids, iii) abdominal pain of unknown origin, and iv) macroscopic bleeding. The second control group consisted of healthy blood donor volunteers from a blood donor center in Prague. All individuals were subjected to standard examinations to verify the health status for blood donation and were cancer-free at the time of the sampling. Details of CRC cases and controls have been reported previously (Vymetalkova et al., 2014; Naccarati et al., 2016; Vymetalkova et al., 2016) [PMID: 24755277; PMID: 26735576; PMID: 27803053]. All subjects were informed and provided written consent to participate in the study. They approved the use of their biological samples for genetic analyses, according to the Declaration of Helsinki. The design of the study was approved by the Ethics Committee of the Institute of Experimental Medicine, Prague, Czech Republic. All subjects included in the study were Caucasians and comprised 1792 cases and 1764 matched controls. Controls were matched to CRC cases as 1:1 ratio. Matching was done on age and sex. Age was matched on +-5 years, whereas sex was matched exactly. For the cases without matched controls, matching was done only on sex. Early Detection Research Network (EDRN): The aim of the EDRN initiative is to develop and sustain a biorepository for support of translational research (Amin et al., 2010) [PMID: 21031013]. High-quality biospecimens were accrued and annotated with pertinent clinical, epidemiologic, molecular and genomic information. A user-friendly annotation tool and query tool was developed for this purpose. The various components of this annotation tool include: CDEs are developed from the College of American Pathologists (CAP) Cancer Checklists and North American Association of Central Cancer Registries (NAACR) standards. The CDEs provides semantic and syntactic interoperability of the data sets by describing them in the form of metadata or data descriptor. A total of 352 colorectal case samples and 399 controls were selected for this study. Controls were matched to CRC cases based on age and sex. The EPICOLON Consortium (EPICOLON): The EPICOLON Consortium comprises a prospective, multicentre and population-based epidemiology survey of the incidence and features of CRC in the Spanish population (Fernandez-Rozadilla et al., 2013) [PMID: 23350875]. Cases were selected as patients with de novo histologically confirmed diagnosis of colorectal adenocarcinoma. Patients with familial adenomatous polyposis, Lynch syndrome or inflammatory bowel disease-related CRC, and cases where patients or family refused to participate in the study were excluded. Hospital-based controls were recruited through the blood collection unit of each hospital, together with cases. All of the controls were confirmed to have no history of cancer or other neoplasm and no reported family history of CRC. Controls were randomly selected and matched with cases for hospital, sex and age (+- 5 years). A total of 370 cases and 370 controls were selected for genotyping. Hawaii Adenoma Study: For this adenoma study, two flexible-sigmoidoscopy screening clinics were first used to recruit participants on Oahu, Hawaii. Adenoma cases were identified either from the baseline examination at the Hawaii site of the Prostate Lung Colorectal and Ovarian cancer screening trial during 1996-2000 or at the Kaiser Permanente Hawaii's Gastroenterology Screening Clinic during 1995-2007. In addition, starting in 2002 and up to 2007, we also approached for recruitment all eligible patients who underwent a colonoscopy in the Kaiser Permanente Hawaii Gastroenterology Department. Cases were patients with histologically confirmed first-time adenoma(s) of the colorectum and were of Japanese, Caucasian or Hawaiian race/ethnicity. Controls were selected among patients with a normal colorectum and were individually matched to the cases on age at exam, sex, race/ethnicity, screening date (+-3 months) and clinic and type of examination (colonoscopy or flexible sigmoidoscopy). We recruited 1016 adenoma cases (67.8% of all eligible) and 1355 controls (69.2% of all eligible); 889 cases and 1169 controls agreed to give a blood and 29 cases and 34 controls, a mouthwash sample. A total of 989 cases and 1185 controls were genotyped for this study. Columbus-area HNPCC Study (HNPCC, OSUMC): Patients with colorectal adenocarcinoma diagnosed at six participating hospitals were eligible for this study, regardless of age at diagnosis or family history of cancer. Patients with a clinical diagnosis of familial adenomatous polyposis were not eligible for this study. These six hospitals perform the vast majority of all operations for CRC in the Columbus metropolitan area (population 1.7 million). The institutional review board at all participating hospitals approved the research protocol and consent form in accordance with assurances filed with and approved by the United States Department of Health and Human Services. Briefly, during the period of January 1999 through August 2004, 1,566 eligible patients with CRC were accrued to the study (Hampel et al., 2008) [PMID 18809606]. A total of 1472 colorectal cancer samples had enough blood DNA remaining to be sent for genotyping. Control samples were provided by the Ohio State University Medical Center%#39;s (OSUMC) Human Genetics Sample Bank. The Columbus Area Controls Sample Bank is a collection of control samples for use in human genetics research that includes both donors' anonymized biological specimens and linked phenotypic data. The data and samples are collected under the protocol "Collection and Storage of Controls for Genetics Research Studies", which is approved by the Biomedical Sciences Institutional Review Board at OSUMC. Recruitment takes place in OSUMC primary care and internal medicine clinics. If individuals agree to participate, they provide written informed consent, complete a questionnaire that includes demographic, medical and family history information, and donate a blood sample. 4-7 ml of blood is drawn into each of 3 ACD Solution A tubes and is used for genomic DNA extraction and the establishment of an EBV-transformed lymphoblastoid cell culture, cell pellet in Trizol, and plasma. Controls were matched to CRC cases as 1:1. Matching was done on age at reference time (age_ref), race, and sex. Age_ref was matched on +-5 years. Sex and race were matched exactly. For the cases without matched controls, matching was done only on sex and race with 1:1 ratio. Since controls are fewer than cases, one control is matched on 2 cases at most. Health Professionals Follow-up Study (HPFS): A parallel prospective study to the NHS (Nurses' Health Study). The HPFS cohort comprised 51,529 men aged 40-75 who, in 1986, responded to a mailed questionnaire (Rimm et al., 1990) [PMID: 2090285]. Participants provided information on health related exposures, including current and past smoking history, age, weight, height, diet, physical activity, aspirin use, and family history of colorectal cancer. Colorectal cancer and other outcomes were reported by participants or next-of-kin and were followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical record review. Information was abstracted on histology and primary location. Incident cases were defined as those occurring after the subject provided the blood sample. Prevalent cases were defined as those occurring after enrollment in the study but before the subject provided the blood sample. Follow-up evaluation has been excellent, with 94% of the men responding to date. Colorectal cancer cases were ascertained through January 1, 2008. In 1993-1995, 18,825 men in the HPFS mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 13,956 men in the HPFS who had not provided a blood sample previously mailed in a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1986, but before the subject provided either a blood or buccal sample. After excluding participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were previously constructed. In addition to colorectal cancer cases and controls, a set of adenoma cases and matched controls with available DNA from buffy coat were selected for genotyping. Over the follow-up period, data were collected on endoscopic screening practices and, if individuals had been diagnosed with a polyp, the polyps were confirmed to be adenomatous by medical record review. Adenoma cases were ascertained through January 1, 2008. A separate case-control set was constructed of participants diagnosed with advanced adenoma matched to control participants who underwent a lower endoscopy in the same time period and did not have an adenoma. Advanced adenoma was defined as an adenoma 1 cm or larger in diameter and/or with tubulovillous, villous, or highgrade dysplasia/carcinoma-in-situ histology. Matching criteria included year of birth (within 1 year) and month/ year of blood sampling (within 6 months), the reason for their lower endoscopy (screening, family history, or symptoms), and the time period of any prior endoscopy (within 2 years). Controls matched to cases with a distal adenoma either had a negative sigmoidoscopy or colonoscopy examination, and controls matched to cases with proximal adenoma all had a negative colonoscopy. In total, 159 advanced adenoma cases and 109 controls were selected for genotyping. Leeds Colorectal Cancer Study (LCCS): Following local ethical approval, colorectal cancer cases were recruited from 1997 until 2012 in Leeds, UK through surgical clinics. Initially, funding was provided by the UK Ministry of Agriculture, Farming and Fisheries (subsequently the Food Standards Agency) and Imperial Cancer Research Fund (subsequently Cancer Research UK). Recruitment also occurred similarly in Dundee, Perth and York between the periods of 1997 and 2001 using the same protocol and the data and samples were combined. Pathologically confirmed cases were consented at outpatient clinics, providing information on known and postulated risk factors for colorectal cancer (diet, lifestyle and family history) as well as providing a blood sample for DNA. Exclusion criteria included pre-existing diverticular disease and an inability to complete the questionnaire. The General Practitioners of cases (all UK residents have a nominated General Practitioner to whom to refer initial medical queries) and these GPs were asked to send letters to other persons on their patient list of the same gender and born within 5 years of the case. Subsequently to enhance the number of controls, we systematically invited patients from selected GP practices. Diet was assessed in cases and controls using an extensive dietary and lifestyle questionnaire modified by that produced by the European Prospective Investigation in Cancer (EPIC). The frequency that each specific food items were eaten was recorded and we also obtained average fruit and vegetable consumption as a cross-check. In total, 1591 cases and 739 controls provided a DNA sample. The North Carolina Colon Cancer Studies (NCCCS I/II): The North Carolina Colon Cancer Studies (NCCCS I- colon and NCCCS II-rectal) were population-based case-control studies conducted in 33 counties of North Carolina. Cases were identified using the rapid case ascertainment system of the North Carolina Central Cancer Registry. Patients with a first diagnosis of histologically confirmed invasive adenocarcinoma of the colon (cecum through sigmoid colon) between October 1996 and September 2000 were classified as potential cases in the NCCCS I. The NCCCS II included patients with a first diagnosis of histologically confirmed invasive adenocarcinoma of the sigmoid colon, rectosigmoid, or rectum (hereafter collectively referred to as rectal cancer) between May 2001 and September 2006. Additional eligibility requirements were: aged 40-80 years, residence in one of the 33 counties, ability to give informed consent and complete an interview, had a driver's license or identification card issued by the North Carolina Department of Motor Vehicles (if under the age of 65), and had no objections from the primary physician in regards to contacting the individual. Controls, identified and sampled during the respective study dates, were selected from two sources. Potential controls under the age of 65 were identified using the North Carolina Department of Motor Vehicles records. For those 65 years and older, records from the Center for Medicare and Medicaid Services were used. Controls were matched to cases using randomized recruitment strategies. Recruitment probabilities were done using strata of 5-year age, sex, and race groups. Dietary information was collected using a modified version of the semiquantitative food frequency questionnaire developed at the National Cancer Institute. In addition, participants were asked about vitamin and mineral supplementation, special diets, restaurant eating, sodium use, and fats used in cooking. In NCCCS I, 515 colorectal cases and 687 matched controls were sent for genotyping. In NCCCS II, 796 colorectal cases and 823 controls were sent from the NCCCS II for genotyping. Controls were matched to CRC cases as 1:1 ratio. Matching was done on age, race, and sex. Age was matched on +-5 years. Race and sex was matched exactly. For the cases without matched controls, matching was done only on sex and race. Nurses Health Study (NHS): The NHS cohort began in 1976 when 121,700 married female registered nurses age 30-55 years returned the initial questionnaire that ascertained a variety of important health-related exposures (Belanger et al., 1978) [PMID: 248266]. Since 1976, follow-up questionnaires have been mailed every 2 years. Colorectal cancer and other outcomes were reported by participants or next-of-kin and followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical-record review. Information was abstracted on histology and primary location. The rate of follow-up evaluation has been high: as a proportion of the total possible follow-up time, follow-up evaluation has been more than 92%. Colorectal cancer cases were ascertained through June 1, 2008. In 1989 -1990, 32,826 women in NHS I mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 29,684 women in NHS I who did not previously provide a blood sample mailed a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1976 but before the subject provided either a blood or buccal sample. After excluding participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were previously constructed from which DNA was isolated from either buffy coat or buccal cells for genotyping. In addition to colorectal cancer cases and controls, a set of advanced adenoma cases and matched controls with available DNA from buffy coat were selected for genotyping. Over the follow-up period, data were collected on endoscopic screening practices and, if individuals had been diagnosed with a polyp, the polyps were confirmed to be adenomatous by medical record review. Adenoma cases were ascertained through June 1, 2011. A separate case-control set was constructed of participants diagnosed with advanced adenoma matched to control participants who underwent a lower endoscopy in the same time period and did not have an adenoma. Advanced adenoma was defined as an adenoma more than 1 cm in diameter and/or with tubulovillous, villous, or high-grade dysplasia/carcinoma-in-situ histology. Matching criteria included year of birth (within 1 year) and month/year of blood sampling (within 6 months), the reason for their lower endoscopy (screening, family history, or symptoms), and the time period of any prior endoscopy (within 2 years). Controls matched to cases with a distal adenoma either had a negative sigmoidoscopy or colonoscopy examination, and controls matched to cases with proximal adenoma all had a negative colonoscopy. A total of 272 cases and 236 matched controls were sent to CIDR for the advanced adenoma case-control set. Northern Swedish Health and Disease Study (NSHDS): Comprises over 110,000 participants, including approximately one third with repeated sampling occasions, from three population-based cohorts (Dahlin et al., 2010; Myte et al., 2016) [PMID: 20197478; PMID: 27367522]. The largest is the ongoing Vasterbotten Intervention Programme, in which all residents of Vasterbotten County are invited to a health examination upon turning 30 (some years), 40, 50 and 60 years of age. Extensive measured and self-reported health and lifestyle data, as well as blood samples for central biobanking in Umea, Sweden, are collected at the health exam. Leucocyte DNA samples for 1:1-matched CRC case-control sets from the NSHDS, of which 878 samples are included in this study, have been selected for genotyping. This is in addition to 354 samples from the NSHDS previously analyzed as part of the multicenter EPIC cohort. Cancer-specific and overall survival data are available for all patients. For at least 425 patients, archival tumor tissue has been analyzed for the BRAF V600E mutation and by sequencing codon 12 and 13 for KRAS mutations, as well as for MSI screening status by immunohistochemistry and for an eight-gene CIMP panel using quantitative real-time PCR (MethyLight). Ohio Colorectal Cancer Prevention Initiative (OCCPI, OSUMC): OCCPI (ClinicalTrials.gov identifier: NCT01850654) is a population-based study of colorectal cancer patients diagnosed in one of 51 hospitals throughout the state of Ohio from January 1, 2013 through December 31, 2016. The OCCPI was created to decrease CRC incidence in Ohio by identifying patients with hereditary predisposition (statewide universal tumor screening for newly diagnosed CRC patients), increase colonoscopy compliance for first-degree relatives of CRC patients, and encourage future research through the creation of a biorepository. The 51 Ohio hospitals participating in the OCCPI were selected to represent a cross-section of clinical centers in the state based on high reported volume of CRC patients, affiliation with a high volume hospital, or interest in participation. Institutional Review Board (IRB) approval was obtained by the individual hospitals, Community Oncology Programs, or by ceding review to the OSU IRB. Written informed consent was obtained. A total of 2139 colorectal cases were genotyped. Patients were considered eligible for this study if they were age 18 or older at the time of enrollment, if they had a surgical resection (or biopsy if unresectable) in the state of Ohio demonstrating an adenocarcinoma of the colorectum from 1/1/13 - 12/31/16. Matched control samples were selected from the Ohio State University Medical Center's (OSUMC) Human Genetics Sample Bank in an identical way to the selection for the Columbus-area HNPCC Study (please refer to the description for the Columbus-area HNPCC Study). Prostate, Lung, Colorectal and Ovarian Cancer Screening Trail (PLCO): PLCO enrolled 154,934 participants (men and women, aged between 55 and 74 years) at ten centers into a large, randomized, two-arm trial to determine the effectiveness of screening to reduce cancer mortality. Sequential blood samples were collected from participants assigned to the screening arm. Participation was 93% at the baseline blood draw. In the observational (control) arm, buccal cells were collected via mail using the "swish-and-spit" protocol and participation rate was 65%. Details of this study have been previously described (Huang et al., 2016) [PMID: 27673363] and are available online (http://dcp.cancer.gov/plco). For this study 1651 advanced adenoma cases and 1392 controls were selected for genotyping. Selenium and Vitamin E Prevention Trial (SELECT): The Selenium and Vitamin E Cancer Prevention Trial (SELECT) was a double-blind, placebo controlled clinical trial which explored using selenium and vitamin E alone and in combination to prevent prostate cancer in healthy men (Lippman et al., 2009) [PMID: 19066370]. Secondary endpoints included the prevention of colorectal and lung cancers. SELECT was conducted at 427 sites and centers in the United States, Canada and Puerto Rico; 35,533 men 55 years and older (50 or older if African American) were randomized beginning August 22, 2001. Supplementation was discontinued on October 23, 2008 due to futility. 308 colorectal cancer cases and 308 matched controls were selected from the SELECT population and sent for genotyping. Screening Markers For Colorectal Disease Study and Colonoscopy and Health Study (SMS-REACH): Details on this study population were previously reported (Burnett-Hartman et al., 2014) [PMID: 24875374]. Participants were enrollees in an integrated health-care delivery system in western Washington State (Group Health Cooperative, Seattle, Washington) aged 24-79 years who underwent an index colonoscopy for any indication between 1998 and 2007 and donated a buccal-cell or blood sample for genotyping analysis. Study recruitment took place in 2 phases, with phase 1 occurring in 1998-2003 and phase 2 occurring in 2004-2007. Persons who had undergone a colonoscopy less than 1 year prior to the index colonoscopy, persons with inadequate bowel preparation for the index colonoscopy, and persons with a prior or new diagnosis of colorectal cancer, a familial colorectal cancer syndrome (such as familial adenomatous polyposis), or another colorectal disease were ineligible. Patients diagnosed with adenomas or serrated polyps and persons who were polyp-free at the index colonoscopy (controls) were systematically recruited during both phases of recruitment. Approximately 75% agreed to participate and provided written informed consent. Based on medical records, persons who agreed to participate and those who refused study participation were similar with respect to age, sex, and colorectal polyp status. Study protocols were approved by the institutional review boards of the Group Health Cooperative and the Fred Hutchinson Cancer Research Center (Seattle, Washington). A total of 575 cases and 508 matched were selected for the study. Controls were matched to CRC cases as 1:1 ratio. Matching was done on age_ref, race, and sex. Age_ref was matched on +-5 years. The Women's Health Initiative (WHI): WHI is a long-term national health study that has focused on strategies for preventing heart disease, breast and colorectal cancer, and osteoporotic fractures in postmenopausal women. The original WHI study included 161,808 postmenopausal women enrolled between 1993 and 1998. The Fred Hutchinson Cancer Research Center in Seattle, WA serves as the WHI Clinical Coordinating Center for data collection, management, and analysis of the WHI. The WHI has two major parts: a partial factorial randomized Clinical Trial (CT) and an Observational Study (OS); both were conducted at 40 Clinical Centers nationwide. The CT enrolled 68,132 postmenopausal women between the ages of 50-79 into trials testing three prevention strategies. If eligible, women could choose to enroll in one, two, or all three of the trial components. The components are: Hormone Therapy Trials (HT): This double-blind component examined the effects of combined hormones or estrogen alone on the prevention of coronary heart disease and osteoporotic fractures, and associated risk for breast cancer. Women participating in this component with an intact uterus were randomized to estrogen plus progestin (conjugated equine estrogens [CEE], 0.625 mg/d plus medroxyprogesterone acetate [MPA] 2.5 mg/d] or a matching placebo. Women with prior hysterectomy were randomized to CEE or placebo. Both trials were stopped early, in July 2002 and March 2004, respectively, based on adverse effects. All HT participants continued to be followed without intervention until close-out. Dietary Modification Trial (DM): The Dietary Modification component evaluated the effect of a low-fat and high fruit, vegetable and grain diet on the prevention of breast and colorectal cancers and coronary heart disease. Study participants were randomized to either their usual eating pattern or a low-fat dietary pattern. Calcium/Vitamin D Trial (CaD): This double-blind component began 1 to 2 years after a woman joined one or both of the other clinical trial components. It evaluated the effect of calcium and vitamin D supplementation on the prevention of osteoporotic fractures and colorectal cancer. Women in this component were randomized to calcium (1000 mg/d) and vitamin D (400 IU/d) supplements or a matching placebo. The Observational Study (OS)examines the relationship between lifestyle, environmental, medical and molecular risk factors and specific measures of health or disease outcomes. This component involves tracking the medical history and health habits of 93,676 women not participating in the CT. Recruitment for the observational study was completed in 1998 and participants were followed annually for 8 to 12 years. All centrally confirmed cases of invasive colorectal cancers, or deaths from colorectal cancer were selected as potential cases from September 30, 2015 database. Controls were participants free of colorectal cancer (invasive or in situ) as of September 30, 2015. Potential cases and controls were excluded if they (1) were non-White; (2) had history of colorectal cancers at baseline; (3) lost to follow-up after enrollment; (4) DbGAP ineligible; (5) had <1.25ug of DNA; (6) selected for WHI study M26 Phase I or II; (7) selected for WHI study AS224 and also included in the imputation project. A total of 578 cases and 104,429 controls met the eligibility criteria. Each case was matched with 1 control (1:1) that exactly met the following matching criteria: age (+-5 years), 40 randomization centers (exact), WHI date (+-3 years), CaD date (+-3 years), OS flag (exact), HRT assignments (exact), DM assignments (exact), and CaD assignments (exact). Control selection was done in a time-forward manner, selecting one control for each case from the risk set at the time of the case's event. The matching algorithm was allowed to select the closest match based on a criteria to minimize an overall distance measure (Bergstralh EJ, Kosanke JL. Computerized matching of cases to controls. Technical Report #56, Department of Health Sciences Research, Mayo Clinic, Rochester MN. April 1995). Each matching factor was given the same weight. When exact matches could not be found, the matching criteria were gradually relaxed among unmatched cases and controls until all cases had found matched controls. Using the matching criteria specified above, 559 of the 578 eligible cases found exact matches. The matching criteria was then relaxed to : Age+-5, randomization centers, WHI date +- 3 years, CaD date +- 3 years, OS flag, HRT flag, DM flag, CaD flag. 17 of the remaining 19 unmatched cases found matched controls. By matching on Age+-5, randomization centers, WHI date +- 3 years, CaD date +- 3 years, OS flag, HRT flag, the remaining 2 unmatched cases found their matches.
The acquisition of somatic mutations is an emerging field of investigation in familial leukemia. Currently, genetic profiles in familial MDS/AML are considered analogous to sporadic disease, although the patterns of clonal evolution within families are poorly defined. We performed whole exome profiling of tumour samples from a novel RUNX1 mutated family, to determine the stepwise evolution of MDS/AML across 4 young siblings. Three siblings developed monocytic AML/RAEB2 at 5 years of age, with hepatosplenomegaly and somatic mutations upregulating JAK-STAT signalling, the latter are typically detected in <5% of sporadic MDS/AML. Two siblings acquired the canonical JAK2 V617F mutation, while another acquired a unique missense mutation of SH2B3, a negative regulator of JAK2. Notably, 2/3 siblings demonstrated dosage amplification of these mutations due to acquired uniparental disomy of chromosomes 9p and 12q (encompassing JAK2 and SH2B3, respectively). All 4 siblings were heterozygous for the 46/1 JAK2 haplotype associated with predisposition to sporadic V617F myeloproliferative disorders, which likely influenced the acquisition of JAK2 mutations. Our findings provide further evidence that relatives with shared germline mutations may acquire somatic mutations in a non-random manner leading to convergent patterns of disease evolution.
1. Odors are detected, firstly, by olfactory sensory neurons (OSNs) in the olfactory epithelium of the nose. This neurons then project directly to the olfactory bulb in the brain. Olfaction depends on cellular regeneration of the OE, olfactory bulb and hippocampus, and on their continual re-wiring. The olfactory neural pathway includes regions of the frontal, temporal and limbic brain, which in turn overlap with brain areas involved in brain disorders. OSNs are the only aspect of the human brain exposed to the external environment. This not only makes them vulnerable to environmental changes, but also accessible for biomedical studies.We have already sequenced and developed a protocol for analyzing the transcriptome of mouse main olfactory epithelium and single OSNs. We propose here to perform a similar study for samples from the human olfactory epithelium. We have developed a minimally invasive method for obtaining human OSNs, among other cells from the nasal epithelium. In this experiment, we have obtained cell samples from the olfactory epithelium, including OSN, from healthy volunteers. We would like to further characterize them by RNA sequencing. This will give us valuable insight into human olfaction. It will also provide a first step into a new avenue to study, and find biomarkers for, brain diseases though the analysis of these easily available neurons. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/