BackgroundValley Fever is typically an infection of the lungs caused by the fungi Coccidioides immitis and Coccidioides posadasii. The incidence of Coccidioidomycosis (CM), or infection with Coccidioides, has dramatically increased over the last 20 years. This is particularly true in the Southwest of the United States, where people often breathe fungal spores that arise from the soil. Reasons for increased infection rates are thought to include population growth and construction in these endemic regions, an increase in the number of people whose immune systems are compromised due to infection or treatment with drugs required for organ transplants, climate change, as well as improved testing practices and greater physician awareness. Mild CM most commonly presents itself with flu-like symptoms and rashes, which can last weeks to months. Individuals with compromised immune systems, specifically-- substantial suppression of the immune cells known as T cells, can develop severe pulmonary and disseminated disease. Infection that remains localized to the lungs is referred to as pulmonary disease, but when the infection spreads out of the lungs into other parts of the body it represents a more serious condition referred to as a disseminated disease, or disseminated CM. In nature, Coccidioides spp. exists as mold and lives in dust and soil. When the contaminated soil or dust is disturbed by human activity, animals, or weather, the Coccidiodies spores are released into the air. Airborne spores are taken up by breathing and settle in the lungs. Once in the moist and warm environment of the lung, spores transform into spherules, which divide and become filled with smaller spores, called endospores. When the spherules get large enough, they rupture and release these endospores, which can spread and disseminate to surrounding tissue. The cycle then repeats itself as these endospores develop into new spherules3. Different ethnic groups have been described to vary in their susceptibility to developing disseminated CM after initial infection with Coccidioides. For example, evidence suggests that African-American and Filipino patients suffer the disseminated disease at a greater rate than other ethnicities. The suggestion that race plays a role in the clinical expression of the disease is still a source of debate amongst the scientific community and any genetic mechanisms responsible for these differences have yet to be fully elucidated. If our genetic makeup influences our ability to limit the spread of infection, finding which DNA differences cause these variances could provide clues to how the body successfully fights infection, and provide opportunities to boost the body’s ability to do this. Further, if we are able to identify the specific genetic risk factors that correlate with the development of disseminated infection, physicians could perform genetic screenings to identify high-risk patients and provide them with preemptive antifungal therapy prior to developing disseminated disease.The genome, made up of DNA, contains all of the information needed for humans to develop and grow. Genome-wide association studies (GWAS) allow us to look for inherited differences that are more common between people who share a particular trait, for example, height or susceptibility to certain diseases, compared to those who do not share the trait. Although some traits and diseases are controlled by a single gene, the majority are influenced by contribution from several, or even many, different genes. To find evidence of genes that contribute to specific traits, GWAS typically compares genome information from large numbers of people who have a particular disease (referred to as “cases”) looking for DNA sequences that are common among these samples, and are different from DNA sequences seen in large numbers of people who lack the trait, but are as much like the cases as possible (referred to as “controls”). The DNA sequence data from each group, cases versus controls, are analyzed to see if there are specific genomic differences that tend to be associated with the disease. MethodsTwo separate GWAS approaches were taken to look for genetic differences that could be responsible for the observed differences between the different patient populations we are studying. The first method, known as genotyping, scans for differences at a set of positions across the genome, which includes both the genes that encode our proteins and the larger amount of DNA that does not. The second method, known as exome sequencing, allows us to compare the entire sequence of the portion of the genome that codes for proteins.  For this study, DNA from patients with either pulmonary or disseminated CM were genotyped and exome sequenced to look for DNA differences that are associated with one condition or the other. All patients were at least 18 years old, had no evidence of immunosuppression, and had proven or probable pulmonary coccidioidomycosis according to established diagnostic criteria. Of these patients, a subset demonstrated disseminated disease, i.e., they showed evidence of coccidioidal infection outside of the thorax by biopsy/aspiration, had radiographic imaging, and show positive coccidioidal serology. Our criteria for including patients with the pulmonary disease were that they must not require ongoing antifungal treatment or show evidence of active CM (in skin test positive patients), show no evidence of extrapulmonary dissemination, and have no evidence of ongoing pulmonary infection (pulmonary nodules are accepted) beyond six months from diagnosis.Patient DNA was purified from blood or from sputum samples by the labs of our collaborators, Drs. George Thompson (UC Davis School of Medicine) and John Galgiani (University of Arizona Health Sciences). Genome-wide association (GWAS) analysis was carried out to look for candidate loci associated with pulmonary versus disseminated disease, taking into account the population structure of the samples. Single nucleotide or insertion/deletion variants were identified from whole-exome sequences (WES) using the Picard/BWA/GATK pipeline. ResultsTable 1. Pulmonary versus Disseminated Cases of Coccidiomycosis for GWAS, Sorted by EthnicityEthnicityPulmonary CasesDisseminated CasesAsian85Black/African American1664Caucasian/White4015Filipino03Hispanic/Latino3414Indian21Mexican American1039Pacific Islander01Samoan03Vietnamese10Unknown16917More than one race02Total373134Table 1 shows the number of samples analyzed from patients with pulmonary versus disseminated disease, and patient ethnicity, where known. In all, we worked with 507 samples, including 134 samples from patients with disseminated disease and 373 samples from patients with pulmonary disease. Of these, 505 samples were genotyped using the Multi-Ethnic Global Array from Illumina Inc. In addition, we were able to generate whole-exome sequence from 498 patient samples. No significant associations were detected that differed between samples from patients with pulmonary versus disseminated disease; that is, no particular DNA sequences were found to be significantly enriched in patients with disseminated disease compared to patients with pulmonary disease. The ability to detect genetic association between specific sequences and genetically determined traits is influenced by several factors, including how many patient samples are available to compare, how many different genes contribute to the trait and how strong their contributions are. When the number of genes is small and the contribution of each gene is great, smaller numbers of patient samples are needed to detect an association. When more genes are involved, or the contribution from each gene is more modest, larger numbers of patient samples must be examined. While we were not able to detect any associated within this study, it does not mean that subsequent studies would not find this connection. Our study suggests significantly more samples should be analyzed in further studies.Whole-exome sequences were generated from 498 samples and were aligned to reference sequences to identify positions where the sequences differed from the reference. These data are being analyzed to determine if any variants are associated with pulmonary, versus disseminated, disease. 
Projects Jointly managed by the European Bioinformatics Institute (EMBL-EBI) in Cambridge (UK) and the Centre for Genomic Regulation (CRG) in Barcelona, the EGA provides an invaluable service to the worldwide biomedical research community. The teams leading the EGA are involved in several international partnerships and consortia in numerous scientific fields, where they contribute to ambitious projects. In addition to the project listed below, The EGA is in a long-standing partnership with the Global Alliance for Genomics and Health (GA4GH), as described on the dedicated page. On-going projects Project Duration Domain Funder Tags EASIGEN-DS | The EASIGEN-DS project aims to conduct a design study to establish a new European Research Infrastructure on Advanced Genomics Technologies, EASIGEN. To develop an excellent scientific, technological and operational design, we will conduct landscape studies, stakeholder consultations, and community surveying. 2025-2028 Genomic and health data Horizon Europe DATA MANAGEMENT DOCUMENTATION INFRASTRUCTURE Go-IMPaCT | Go-IMPaCT will contribute sequenced genomes and provide infrastructure as part of IMPaCT-Cohort, one of the three fundamental pillars of the Precision Medicine Infrastructure associated with Science and Technology (IMPaCT) program in Spain. Along with the Genome of Europe (GoE) project, around 18.000 people will have their genomes sequenced, also contributing to Spain's commitments in 1+MG. Go-IMPaCT will fund the development of an EGA node to manage and share this genomic and phenoclinic data, laying the foundations for regional and ethnic genomic variability in Spain to be available for research purposes. The IMPaCT cohort is created with the spirit of being an open research tool, compatible with the rest of the health research ecosystem, and other international initiatives. 2025-2027 Large-scale genomics and health data; personalised medicine Instituto de Salud Carlos III ACCESS DISCOVERY INFRASTRUCTURE METADATA STANDARDS FAIR-FEGA | This project seeks to accelerate data depositions into FEGA, significantly increasing the data flow in and from FEGA nodes. It will build capacity within the FEGA nodes and increase awareness in a wide range of stakeholders, thus altogether achieving the ultimate goal of enhancing data reuse. The project will be carried out by a strategic consortium comprising seven ELIXIR nodes and two ELIXIR communities. 2025-2026 Not applicable ELIXIR ACCESS DISCOVERY DOCUMENTATION INFRASTRUCTURE METADATA STANDARDS FEGA-Connect | A consortium of six ELIXIR nodes plus the Polish FEGA node (in-kind contribution) joining forces to build a solid base to develop solutions for effective multi-omic sensitive data integration between FEGA nodes and other infrastructures and specialised Data repositories. We aim to promote a more coherent data deposition, discoverability and retrieval of multi-omics datasets, providing FAIRer data and consequently accelerating research. 2025-2026 Multi-omics data ELIXIR ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE METADATA STANDARDS IMPaCT-Data 2 | IMPaCT-Data 2 will develop a digital platform for the integration and modelling of biomedical data associated with IMPaCT (Precision Medicine Infrastructure associated with Science and Technology) projects in Spain. It will deploy a sustainable infrastructure that facilitates the integration, standardisation, interoperability and analysis of clinical, genomic, molecular and medical imaging data. This platform will be aligned with European projects such as Genome of Europe (GoE), the first project to make use of the European Genomic Data Infrastructure (GDI), and EUCAIM. IMPaCT-Data 2 will benefit from advanced Artificial Intelligence and High Computing Capacity Systems capabilities, offering robust and accessible tools for researchers from the National Health System in Spain. 2025-2026 Large-scale genomics and health data; personalised medicine Instituto de Salud Carlos III ACCESS DISCOVERY INFRASTRUCTURE METADATA STANDARDS ERDERA | The European Rare Disease Research Alliance (ERDERA) takes over EJPRD to deliver concrete health benefits to rare disease patients in the next decade by advancing prevention, diagnosis and treatment research. To leave no one behind, over 170 organisations championed by the European Union and member states are working hand in hand to make Europe a world leader in rare diseases research and innovation. 2024-2034 Rare diseases Horizon Europe; "La Caixa" Foundation cofunds CRG's contribution ACCESS DATA ANALYSIS DISCOVERY INFRASTRUCTURE GoE | The Genome of Europe initiative aims to build a European network of national genomic reference cohorts of at least 500.000 citizens. These reference cohorts will be selected to be representative of the European population. 2024-2028 Large-scale genomic and health data Horizon Europe ACCESS DISCOVERY INFRASTRUCTURE METADATA STANDARDS HEREDITARY | HEREDITARY aims to transform the way we approach disease detection, prepare treatment response, and explore medical knowledge by building a robust, interoperable, trustworthy, and secure framework that integrates multimodal health data (including genetic data) while ensuring compliance with cross-national privacy-preserving policies. 2024-2027 Neurodegenerative disorders, gut-brain interplay Horizon Europe DATA MANAGEMENT DATA ANALYSIS EOSC-ENTRUST | The mission of EOSC-ENTRUST is to create a European network of trusted research environments for sensitive data and to drive European interoperability by joint development of a common blueprint for federated data access and analysis. 2024-2026 Trusted Research Environment Horizon Europe INFRASTRUCTURE EBV-MS | "Targeting Epstein-Barr Virus Infection for Treatment and Prevention of Multiple Sclerosis". The ambitious goals of the project are to answer the questions why only a few EBV infected persons develop MS, and define the underlying mechanism of this process, as well as clarify if targeting the EBV infection can prevent MS or improve the disease course. 2023-2028 Viral-host genetics; immune response; disease modelling; Disease prevention Horizon Europe DATA MANAGEMENT DATA ANALYSIS WISDOM | WELL-BEING IMPROVEMENT THROUGH THE INTEGRATION OF HEALTHCARE AND RESEARCH DATA AND MODELS WITHOUT BORDER FOR CHRONIC IMMUNE-MEDIATED DISEASES aims to deploy novel approaches for data processing, harmonisation, management, and secure data sharing and federated access for diseases like multiple sclerosis. Using an end-user guided approach, it will facilitate responsible and critical assessment of the use of AI in healthcare. 2023-2028 Chronic immune-mediated diseases Horizon Europe DATA MANAGEMENT INFRASTRUCTURE EUCAIM | EUropean Federation for CAncer IMages is a project that will build a highly secure, federated and large-scale European cancer imaging platform, with capabilities that will greatly enhance the potential of Artificial Intelligence in oncology. 2023-2027 Cancer Digital Europe Programme (DIGITAL) DISCOVERY CONTAGIO | CONTAGIO (COhorts Network To be Activated Globally In Outbreaks) aims to create coordination mechanisms to rapidly react to infectious disease (re-)emergence in low- and middle-income countries (LMICs). 2023-2026 Infectious Diseases European Commission - Horizon Europe ACCESS DATA MANAGEMENT DISCOVERY Youth-GEMs | Youth-GEMS (Gene Environment Interactions in Mental Health TrajectorieS of Youth) will conduct research into the genetic and environmental factors of mental health in young European people. 2022-2027 Mental health European Commission - Horizon Europe DATA MANAGEMENT DISCOVERY GDI | The European Genomics Data Infrastructure project is enabling access to genomic and related phenotypic and clinical data across Europe. It is doing this by establishing a federated, sustainable and secure infrastructure to access the data. 2022-2026 Genomic and health data European Commission - Horizon Europe; "La Caixa" Foundation cofunds CRG's contribution DISCOVERY DOCUMENTATION INFRASTRUCTURE EOSC4Cancer | EOSC4Cancer builds on existing projects, research outcomes and established community solutions to create the federated FAIR data, analysis and services infrastructure needed for European Cancer research programmes. 2022-2025 Cancer European Commission - Horizon Europe DISCOVERY IMPaCT-T2D | The IMPaCT-T2D project aims at studying the complete genomes of a large cohort of patients with Type 2 Diabetes mellitus (T2D), using modern sequencing technologies and artificial intelligence (AI) in order to improve the stratification and pharmacological treatment in the context of precision medicine. 2022-2025 Cardiovascular and Complex Diseases Spanish Ministry of Science and Innovation; Instituto de Salud Carlos III ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE EuCanImage | A European Cancer Image Platform Linked to Biological and Health Data for Next-Generation Artificial Intelligence and Precision Medicine in Oncology. 2020-2025 AI Solutions in Oncology European Commission - H2020 Programme; "La Caixa" Foundation cofunds CRG's contribution DATA MANAGEMENT METADATA STANDARDS GenoMed4ALL | A consortium built to empower personalised medicine in the field of haematological diseases through the use of AI and the pooling of genomic and clinical data. 2020-2025 Hematological diseases European Commission - H2020 Programme DISCOVERY METADATA STANDARDS Completed projects Project Duration Domain Funder Tags BY-COVID | The BeYond-COVID project aims to make COVID-19 data accessible to scientists in laboratories but also to anyone who can use it, such as medical staff in hospitals or government officials. Going beyond SARS-CoV-2 data, the project will provide a framework for making data from other infectious diseases open and accessible to everyone. 2021-2024 Infectious diseases European Commission - H2020 Programme ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE IMPaCT-Data | IMPaCT-Data aims to create the infrastructure for secondary use of data from Spanish healthcare systems - electronic health records, medical imaging and genomic repositories - and contribute with the knowledge and methodology produced to the healthcare system. 2021-2024 Large-scale genomics and health dataSpanish Ministry of Science and Innovation; Instituto de Salud Carlos III ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE LaMarató | It is a project aimed at creating and developing a catalan interhospitalary network to interrogate genetic variants from thousands of genetic tests carried out in patients with rare diseases from the main catalan hospitals. 2021-2024 Genomic and health data Fundació La Marató de TV3 (catalan foundation) DISCOVERY HealthyCloud | This consortium will contribute a Strategic Agenda towards the European Health Research and Innovation Cloud. The project will work in collaboration with a broad range of stakeholders to ensure that all voices are included and that the results are technically and ethically sound. 2021-2023 Not Applicable European Commission - H2020 Programme DOCUMENTATION B1MG | Beyond 1 Million Genomes aims to create a network of genetic and clinical data across Europe. The project provides coordination and support to the 1+ Million Genomes Initiative (1+MG). This initiative is a commitment of 24 EU countries, the UK and Norway to give cross-border access to one million sequenced genomes by 2022. 2020-2023 Not applicable European Commission - Horizon Europe DATA MANAGEMENT INFRASTRUCTURE METADATA STANDARDS ELIXIR-CONVERGE | An alliance with the goal of Connecting and aligning ELIXIR Nodes to deliver sustainable FAIR life-science data management services. 2020-2023 Data Management and Infectious Diseases European Commission - H2020 Programme DATA MANAGEMENT INFRASTRUCTURE METADATA STANDARDS IHCC | The International HundredK+ Cohorts Consortium aims to create a global platform for translational research – informing the biological and genetic basis for disease and improving clinical care and population health. 2020-2022 Translational research NIH; The Wellcome Trust; CZI INFRASTRUCTURE METADATA STANDARDS PPCG | The Pan Prostate Cancer Group aims to harmonise and interrogate Whole Genome DNA Sequence data generated around the world from over 2000 men with prostate cancer, with associated transcriptome and methylome data to include men from different clinical categories, and ethnicities. This project is about providing breakthrough advances through analysis of a very large series of Whole Genome DNA data from prostate cancer contributed by many of the leading scientists and clinicians working in prostate cancer genomics. 2019-2024 Cancer Cancer Research UK DATA MANAGEMENT CINECA | Consortium providing a Federated solution enabling population-scale genomic and biomolecular data accessible across international borders accelerating research and improving the health of individuals resident across continents. 2019-2023 Large-scale Genomics and Health Data European Commission - H2020 Programme ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE EASI-Genomics | A project designed to provide easy access to cutting-edge DNA sequencing technologies to researchers from academia and industry, within a framework that ensures compliance with ethical and legal requirements, as well as FAIR and secure data management. 2019-2023 Next Generation Sequencing European Commission - H2020 Programme ACCESS EJP-RD | An European consortium built to create a comprehensive, sustainable ecosystem allowing a virtuous circle between research, care, and medical innovation. 2019-2023 Rare diseases European Commission - H2020 Programme ACCESS DATA MANAGEMENT DOCUMENTATION METADATA STANDARDS EOSC-Life | EOSC-Life brings together the 13 Life Science research infrastructures (LS RIs) to create an open, digital and collaborative space for biological and medical research. The project will publish 'FAIR' data and a catalogue of services provided by participating RIs for the management, storage and reuse of data in the European Open Science Cloud (EOSC). 2019-2023 Not applicable European Commission - H2020 Programme DOCUMENTATION EUCANCan | A federated network aiming at implementing a cultural, technological and legal integrated framework across Europe and Canada, to enable and facilitate the efficient sharing of cancer genomic data. 2019-2023 Cancer European Commission - H2020 Programme DATA MANAGEMENT METADATA STANDARDS The Federated EGA framework: supporting sensitive data management across the ELIXIR Nodes | This project is a direct continuation of the FHD IS with the goal to position the FEGA framework as the core infrastructure driver to support human data sharing for research. 2019-2023 Human genomic data ELIXIR INFRASTRUCTURE UK Biobank | UK Biobank is a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants. This project is to archive whole genome sequencing and other genetic data for UK Biobank participants. 2019-2023 Large-scale Genomics and Health Data The Wellcome Trust; UKRI; Amgen; AstraZeneca; GSK; Johnson & Johnson DATA MANAGEMENT INFRASTRUCTURE VEIS | The core mission of VEIS is to create an open ecosystem of technologies that will address and adapt to the requirements of the systems used to analyse and interpret -omics and clinical data in research and application environments in biomedicine. The aim of the project is to leverage the value of the EGA for both industry and society. 2019-2022 Oncology and Rare diseases Generalitat de Catalunya and European Regional Development Fund (ERDF) ACCESS DISCOVERY ELIXIR BEACON IS | This study follows on from a number of earlier activities that have established the ELIXIR Beacon Project. The main aim is to extend the Beacon protocol, developed at EGA, to become the reference ELIXIR Data Discovery product 2019-2021 Not applicable ELIXIR DISCOVERY ELIXIR FHD IS | This project coordinates the delivery of FAIR compliant metadata standards, interfaces, and reference implementation to support the federated ELIXIR network of human data resources. 2019-2021 Human genomic data ELIXIR INFRASTRUCTURE ELIXIR Rare Disease | The Rare Disease Community extends and generalises the system of access authorisation and high volume secure data transfer developed within the EGA. The goal of the Community is to create a federated infrastructure that will enable researchers to discover, access and analyse different rare disease repositories across Europe. It is doing this in partnership with other European infrastructure projects, namely RD-CONNECT, BBMRI-ERIC and E-Rare.2019-2021 Rare diseases ELIXIR INFRASTRUCTURE Solve-RD | Solve-RD - solving the unsolved rare diseases - is a research project funded by the European Commission. It echoes the ambitious goals set out by the International Rare Diseases Research Consortium (IRDiRC) to deliver diagnostic tests for most rare diseases by 2020. The current diagnostic and subsequent therapeutic management of rare diseases is still highly unsatisfactory for a large proportion of rare disease patients - the unsolved RD cases. For these unsolved rare diseases, we are unable to explain the etiology responsible for the disease phenotype, predict the individual disease risk and/or rate of disease progression, and/or quantitate the risk of relatives to develop the same disorder. 2018-2024 Rare diseases European Commission - H2020 Programme ACCESS DATA MANAGEMENT METADATA STANDARDS EuCanShare | An EU-Canada joint infrastructure for next-generation multi-Study Heart research. 2018-2022 Cardiovascular Diseases European Commission - H2020 Programme ACCESS METADATA STANDARDS
Original description of the study: From ELLIPSE (linked to the PRACTICAL consortium), we contributed ~78,000 SNPs to the OncoArray. A large fraction of the content was derived from the GWAS meta-analyses in European ancestry populations (overall and aggressive disease; ~27K SNPs). We also selected just over 10,000 SNPs from the meta-analyses in the non-European populations, with a majority of these SNPs coming from the analysis of overall prostate cancer in African ancestry populations as well as from the multiethnic meta-analysis. A substantial fraction of SNPs (~28,000) were also selected for fine-mapping of 53 loci not included in the common fine-mapping regions (tagging at r2>0.9 across ±500kb regions). We also selected a few thousand SNPs related with PSA levels and/or disease survival as well as SNPs from candidate lists provided by study collaborators, as well as from meta-analyses of exome SNP chip data from the Multiethnic Cohort and UK studies. The Contributing Studies: Aarhus: Hospital-based, Retrospective, Observational. Source of cases: Patients treated for prostate adenocarcinoma at Department of Urology, Aarhus University Hospital, Skejby (Aarhus, Denmark). Source of controls: Age-matched males treated for myocardial infarction or undergoing coronary angioplasty, but with no prostate cancer diagnosis based on information retrieved from the Danish Cancer Register and the Danish Cause of Death Register. AHS: Nested case-control study within prospective cohort. Source of cases: linkage to cancer registries in study states. Source of controls: matched controls from cohort ATBC: Prospective, nested case-control. Source of cases: Finnish male smokers aged 50-69 years at baseline. Source of controls: Finnish male smokers aged 50-69 years at baseline BioVu: Cases identified in a biobank linked to electronic health records. Source of cases: A total of 214 cases were identified in the VUMC de-identified electronic health records database (the Synthetic Derivative) and shipped to USC for genotyping in April 2014. The following criteria were used to identify cases: Age 18 or greater; male; African Americans (Black) only. Note that African ancestry is not self-identified, it is administratively or third-party assigned (which has been shown to be highly correlated with genetic ancestry for African Americans in BioVU; see references). Source of controls: Controls were identified in the de-identified electronic health record. Unfortunately, they were not age matched to the cases, and therefore cannot be used for this study. Canary PASS: Prospective, Multi-site, Observational Active Surveillance Study. Source of cases: clinic based from Beth Israel Deaconness Medical Center, Eastern Virginia Medical School, University of California at San Francisco, University of Texas Health Sciences Center San Antonio, University of Washington, VA Puget Sound. Source of controls: N/A CCI: Case series, Hospital-based. Source of cases: Cases identified through clinics at the Cross Cancer Institute. Source of controls: N/A CerePP French Prostate Cancer Case-Control Study (ProGene): Case-Control, Prospective, Observational, Hospital-based. Source of cases: Patients, treated in French departments of Urology, who had histologically confirmed prostate cancer. Source of controls: Controls were recruited as participating in a systematic health screening program and found unaffected (normal digital rectal examination and total PSA < 4 ng/ml, or negative biopsy if PSA > 4 ng/ml). COH: hospital-based cases and controls from outside. Source of cases: Consented prostate cancer cases at City of Hope. Source of controls: Consented unaffected males that were part of other studies where they consented to have their DNA used for other research studies. COSM: Population-based cohort. Source of cases: General population. Source of controls: General population CPCS1: Case-control - Denmark. Source of cases: Hospital referrals. Source of controls: Copenhagen General Population Study CPCS2: Source of cases: Hospital referrals. Source of controls: Copenhagen General Population Study CPDR: Retrospective cohort. Source of cases: Walter Reed National Military Medical Center. Source of controls: Walter Reed National Military Medical Center ACS_CPS-II: Nested case-control derived from a prospective cohort study. Source of cases: Identified through self-report on follow-up questionnaires and verified through medical records or cancer registries, identified through cancer registries or the National Death Index (with prostate cancer as the primary cause of death). Source of controls: Cohort participants who were cancer-free at the time of diagnosis of the matched case, also matched on age (±6 mo) and date of biospecimen donation (±6 mo). EPIC: Case-control - Germany, Greece, Italy, Netherlands, Spain, Sweden, UK. Source of cases: Identified through record linkage with population-based cancer registries in Italy, the Netherlands, Spain, Sweden and UK. In Germany and Greece, follow-up is active and achieved through checks of insurance records and cancer and pathology registries as well as via self-reported questionnaires; self-reported incident cancers are verified through medical records. Source of controls: Cohort participants without a diagnosis of cancer EPICAP: Case-control, Population-based, ages less than 75 years at diagnosis, Hérault, France. Source of cases: Prostate cancer cases in all public hospitals and private urology clinics of département of Hérault in France. Cases validation by the Hérault Cancer Registry. Source of controls: Population-based controls, frequency age matched (5-year groups). Quotas by socio-economic status (SES) in order to obtain a distribution by SES among controls identical to the SES distribution among general population men, conditionally to age. ERSPC: Population-based randomized trial. Source of cases: Men with PrCa from screening arm ERSPC Rotterdam. Source of controls: Men without PrCa from screening arm ERSPC Rotterdam ESTHER: Case-control, Prospective, Observational, Population-based. Source of cases: Prostate cancer cases in all hospitals in the state of Saarland, from 2001-2003. Source of controls: Random sample of participants from routine health check-up in Saarland, in 2000-2002 FHCRC: Population-based, case-control, ages 35-74 years at diagnosis, King County, WA, USA. Source of cases: Identified through the Seattle-Puget Sound SEER cancer registry. Source of controls: Randomly selected, age-frequency matched residents from the same county as cases Gene-PARE: Hospital-based. Source of cases: Patients that received radiotherapy for treatment of prostate cancer. Source of controls: n/a Hamburg-Zagreb: Hospital-based, Prospective. Source of cases: Prostate cancer cases seen at the Department of Oncology, University Hospital Center Zagreb, Croatia. Source of controls: Population-based (Croatia), healthy men, older than 50, with no medical record of cancer, and no family history of cancer (1st & 2nd degree relatives) HPFS: Nested case-control. Source of cases: Participants of the HPFS cohort. Source of controls: Participants of the HPFS cohort IMPACT: Observational. Source of cases: Carriers and non-carriers (with a known mutation in the family) of the BRCA1 and BRCA2 genes, aged between 40 and 69, who are undergoing prostate screening with annual PSA testing. This cohort has been diagnosed with prostate cancer during the study. Source of controls: Carriers and non-carriers (with a known mutation in the family) of the BRCA1 and BRCA2 genes, aged between 40 and 69, who are undergoing prostate screening with annual PSA testing. This cohort has not been diagnosed with prostate cancer during the study. IPO-Porto: Hospital-based. Source of cases: Early onset and/or familial prostate cancer. Source of controls: Blood donors Karuprostate: Case-control, Retrospective, Population-based. Source of cases: From FWI (Guadeloupe): 237 consecutive incident patients with histologically confirmed prostate cancer attending public and private urology clinics; From Democratic Republic of Congo: 148 consecutive incident patients with histologically confirmed prostate cancer attending the University Clinic of Kinshasa. Source of controls: From FWI (Guadeloupe): 277 controls recruited from men participating in a free systematic health screening program open to the general population; From Democratic Republic of Congo: 134 controls recruited from subjects attending the University Clinic of Kinshasa KULEUVEN: Hospital-based, Prospective, Observational. Source of cases: Prostate cancer cases recruited at the University Hospital Leuven. Source of controls: Healthy males with no history of prostate cancer recruited at the University Hospitals, Leuven. LAAPC: Subjects were participants in a population-based case-control study of aggressive prostate cancer conducted in Los Angeles County. Cases were identified through the Los Angeles County Cancer Surveillance Program rapid case ascertainment system. Eligible cases included African American, Hispanic, and non-Hispanic White men diagnosed with a first primary prostate cancer between January 1, 1999 and December 31, 2003. Eligible cases also had (a) prostatectomy with documented tumor extension outside the prostate, (b) metastatic prostate cancer in sites other than prostate, (c) needle biopsy of the prostate with Gleason grade ≥8, or (d) needle biopsy with Gleason grade 7 and tumor in more than two thirds of the biopsy cores. Eligible controls were men never diagnosed with prostate cancer, living in the same neighborhood as a case, and were frequency matched to cases on age (± 5 y) and race/ethnicity. Controls were identified by a neighborhood walk algorithm, which proceeds through an obligatory sequence of adjacent houses or residential units beginning at a specific residence that has a specific geographic relationship to the residence where the case lived at diagnosis. Malaysia: Case-control. Source of cases: Patients attended the outpatient urology or uro-onco clinic at University Malaya Medical Center. Source of controls: Population-based, age matched (5-year groups), ascertained through electoral register, Subang Jaya, Selangor, Malaysia MCC-Spain: Case-control. Source of cases: Identified through the urology departments of the participating hospitals. Source of controls: Population-based, frequency age and region matched, ascertained through the rosters of the primary health care centers MCCS: Nested case-control, Melbourne, Victoria. Source of cases: Identified by linkage to the Victorian Cancer Registry. Source of controls: Cohort participants without a diagnosis of cancer MD Anderson: Participants in this study were identified from epidemiological prostate cancer studies conducted at the University of Texas MD Anderson Cancer Center in the Houston Metropolitan area. Cases were accrued in the Houston Medical Center and were not restricted with respect to Gleason score, stage or PSA. Controls were identified via random-digit-dialing or among hospital visitors and they were frequency matched to cases on age and race. Lifestyle, demographic, and family history data were collected using a standardized questionnaire. MDACC_AS: A prospective cohort study. Source of cases: Men with clinically organ-confined prostate cancer meeting eligibility criteria for a prospective cohort study of active surveillance at MD Anderson Cancer Center. Source of controls: N/A MEC: The Multiethnic Cohort (MEC) is comprised of over 215,000 men and women recruited from Hawaii and the Los Angeles area between 1993 and 1996. Between 1995 and 2006, over 65,000 blood samples were collected from participants for genetic analyses. To identify incident cancer cases, the MEC was cross-linked with the population-based Surveillance, Epidemiology and End Results (SEER) registries in California and Hawaii, and unaffected cohort participants with blood samples were selected as controls MIAMI (WFPCS): Prostate cancer cases and controls were recruited from the Departments of Urology and Internal Medicine of the Wake Forest University School of Medicine using sequential patient populations as described previously (PMID:15342424). All study subjects received a detailed description of the study protocol and signed their informed consent, as approved by the medical center's Institutional Review Board. The general eligibility criteria were (i) able to comprehend informed consent and (ii) without previously diagnosed cancer. The exclusion criteria were (i) clinical diagnosis of autoimmune diseases; (ii) chronic inflammatory conditions; and (iii) infections within the past 6 weeks. Blood samples were collected from all subjects. MOFFITT: Hospital-based. Source of cases: clinic based from Moffitt Cancer Center. Source of controls: Moffitt Cancer Center affiliated Lifetime cancer screening center NMHS: Case-control, clinic based, Nashville TN. Source of cases: All urology clinics in Nashville, TN. Source of controls: Men without prostate cancer at prostate biopsy. PCaP: The North Carolina-Louisiana Prostate Cancer Project (PCaP) is a multidisciplinary population-based case-only study designed to address racial differences in prostate cancer through a comprehensive evaluation of social, individual and tumor level influences on prostate cancer aggressiveness. PCaP enrolled approximately equal numbers of African Americans and Caucasian Americans with newly-diagnosed prostate cancer from North Carolina (42 counties) and Louisiana (30 parishes) identified through state tumor registries. African American PCaP subjects with DNA, who agreed to future use of specimens for research, participated in OncoArray analysis. PCMUS: Case-control - Sofia, Bulgaria. Source of cases: Patients of Clinic of Urology, Alexandrovska University Hospital, Sofia, Bulgaria, PrCa histopathologically confirmed. Source of controls: 72 patients with verified BPH and PSA<3,5; 78 healthy controls from the MMC Biobank, no history of PrCa PHS: Nested case-control. Source of cases: Participants of the PHS1 trial/cohort. Source of controls: Participants of the PHS1 trial/cohort PLCO: Nested case-control. Source of cases: Men with a confirmed diagnosis of prostate cancer from the PLCO Cancer Screening Trial. Source of controls: Controls were men enrolled in the PLCO Cancer Screening Trial without a diagnosis of cancer at the time of case ascertainment. Poland: Case-control. Source of cases: men with unselected prostate cancer, diagnosed in north-western Poland at the University Hospital in Szczecin. Source of controls: cancer-free men from the same population, taken from the healthy adult patients of family doctors in the Szczecin region PROCAP: Population-based, Retrospective, Observational. Source of cases: Cases were ascertained from the National Prostate Cancer Register of Sweden Follow-Up Study, a retrospective nationwide cohort study of patients with localized prostate cancer. Source of controls: Controls were selected among men referred for PSA testing in laboratories in Stockholm County, Sweden, between 2010 and 2012. PROGReSS: Hospital-based, Prospective, Observational. Source of cases: Prostate cancer cases from the Hospital Clínico Universitario de Santiago de Compostela, Galicia, Spain. Source of controls: Cancer-free men from the same population ProMPT: A study to collect samples and data from subjects with and without prostate cancer. Retrospective, Experimental. Source of cases: Subjects attending outpatient clinics in hospitals. Source of controls: Subjects attending outpatient clinics in hospitals ProtecT: Trial of treatment. Samples taken from subjects invited for PSA testing from the community at nine centers across United Kingdom. Source of cases: Subjects who have a proven diagnosis of prostate cancer following testing. Source of controls: Identified through invitation of subjects in the community. PROtEuS: Case-control, population-based. Source of cases: All new histologically-confirmed cases, aged less or equal to 75 years, diagnosed between 2005 and 2009, actively ascertained across Montreal French hospitals. Source of controls: Randomly selected from the Provincial electoral list of French-speaking men between 2005 and 2009, from the same area of residence as cases and frequency-matched on age. QLD: Case-control. Source of cases: A longitudinal cohort study (Prostate Cancer Supportive Care and Patient Outcomes Project: ProsCan) conducted in Queensland, through which men newly diagnosed with prostate cancer from 26 private practices and 10 public hospitals were directly referred to ProsCan at the time of diagnosis by their treating clinician (age range 43-88 years). All cases had histopathologically confirmed prostate cancer, following presentation with an abnormal serum PSA and/or lower urinary tract symptoms. Source of controls: Controls comprised healthy male blood donors with no personal history of prostate cancer, recruited through (i) the Australian Red Cross Blood Services in Brisbane (age range 19-76 years) and (ii) the Australian Electoral Commission (AEC) (age and post-code/ area matched to ProsCan, age range 54-90 years). RAPPER: Multi-centre, hospital based blood sample collection study in patients enrolled in clinical trials with prospective collection of radiotherapy toxicity data. Source of cases: Prostate cancer patients enrolled in radiotherapy trials: CHHiP, RT01, Dose Escalation, RADICALS, Pelvic IMRT, PIVOTAL. Source of controls: N/A SABOR: Prostate Cancer Screening Cohort. Source of cases: Men >45 yrs of age participating in annual PSA screening. Source of controls: Males participating in annual PSA prostate cancer risk evaluations (funded by NCI biomarkers discovery and validation grant), recruited through University of Texas Health Science Center at San Antonio and affiliated sites or through study advertisements, enrolment open to the community SCCS: Case-control in cohort, Southeastern USA. Prospective, Observational, Population-based. Source of cases: SCCS entry population. Source of controls: SCCS entry population SCPCS: Population-based, Retrospective, Observational. Source of cases: South Carolina Central Cancer Registry. Source of controls: Health Care Financing Administration beneficiary file SEARCH: Case-control - East Anglia, UK. Source of cases: Men < 70 years of age registered with prostate cancer at the population-based cancer registry, Eastern Cancer Registration and Information Centre, East Anglia, UK. Source of controls: Men attending general practice in East Anglia with no known prostate cancer diagnosis, frequency matched to cases by age and geographic region SNP_Prostate_Ghent: Hospital-based, Retrospective, Observational. Source of cases: Men treated with IMRT as primary or postoperative treatment for prostate cancer at the Ghent University Hospital between 2000 and 2010. Source of controls: Employees of the University hospital and members of social activity clubs, without a history of any cancer. SPAG: Hospital-based, Retrospective, Observational. Source of cases: Guernsey. Source of controls: Guernsey STHM2: Population-based, Retrospective, Observational. Source of cases: Cases were selected among men referred for PSA testing in laboratories in Stockholm County, Sweden, between 2010 and 2012. Source of controls: Controls were selected among men referred for PSA testing in laboratories in Stockholm County, Sweden, between 2010 and 2012. PCPT: Case-control from a randomized clinical trial. Source of cases: Randomized clinical trial. Source of controls: Randomized clinical trial SELECT: Case-cohort from a randomized clinical trial. Source of cases: Randomized clinical trial. Source of controls: Randomized clinical trial TAMPERE: Case-control - Finland, Retrospective, Observational, Population-based. Source of cases: Identified through linkage to the Finnish Cancer Registry and patient records; and the Finnish arm of the ERSPC study. Source of controls: Cohort participants without a diagnosis of cancer UGANDA: Uganda Prostate Cancer Study: Uganda is a case-control study of prostate cancer in Kampala Uganda that was initiated in 2011. Men with prostate cancer were enrolled from the Urology unit at Mulago Hospital and men without prostate cancer (i.e. controls) were enrolled from other clinics (i.e. surgery) at the hospital. UKGPCS: ICR, UK. Source of cases: Cases identified through clinics at the Royal Marsden hospital and nationwide NCRN hospitals. Source of controls: Ken Muir's control- 2000 ULM: Case-control - Germany. Source of cases: familial cases (n=162): identified through questionnaires for family history by collaborating urologists all over Germany; sporadic cases (n=308): prostatectomy series performed in the Clinic of Urology Ulm between 2012 and 2014. Source of controls: age-matched controls (n=188): age-matched men without prostate cancer and negative family history collected in hospitals of Ulm WUGS/WUPCS: Cases Series, USA. Source of cases: Identified through clinics at Washington University in St. Louis. Source of controls: Men diagnosed and managed with prostate cancer in University based clinic. Acknowledgement Statements: Aarhus: This study was supported by the Danish Strategic Research Council (now Innovation Fund Denmark) and the Danish Cancer Society. The Danish Cancer Biobank (DCB) is acknowledged for biological material. AHS: This work was supported by the Intramural Research Program of the NIH, National Cancer Institute, Division of Cancer Epidemiology and Genetics (Z01CP010119). ATBC: This research was supported in part by the Intramural Research Program of the NIH and the National Cancer Institute. Additionally, this research was supported by U.S. Public Health Service contracts N01-CN-45165, N01-RC-45035, N01-RC-37004, HHSN261201000006C, and HHSN261201500005C from the National Cancer Institute, Department of Health and Human Services. BioVu: The dataset(s) used for the analyses described were obtained from Vanderbilt University Medical Center's BioVU which is supported by institutional funding and by the National Center for Research Resources, Grant UL1 RR024975-01 (which is now at the National Center for Advancing Translational Sciences, Grant 2 UL1 TR000445-06). Canary PASS: PASS was supported by Canary Foundation and the National Cancer Institute's Early Detection Research Network (U01 CA086402) CCI: This work was awarded by Prostate Cancer Canada and is proudly funded by the Movember Foundation - Grant # D2013-36.The CCI group would like to thank David Murray, Razmik Mirzayans, and April Scott for their contribution to this work. CerePP French Prostate Cancer Case-Control Study (ProGene): None reported COH: SLN is partially supported by the Morris and Horowitz Families Endowed Professorship COSM: The Swedish Research Council, the Swedish Cancer Foundation CPCS1 & CPCS2: Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Herlev Ringvej 75, DK-2730 Herlev, DenmarkCPCS1 would like to thank the participants and staff of the Copenhagen General Population Study for their important contributions. CPDR: Uniformed Services University for the Health Sciences HU0001-10-2-0002 (PI: David G. McLeod, MD) CPS-II: The American Cancer Society funds the creation, maintenance, and updating of the Cancer Prevention Study II cohort. CPS-II thanks the participants and Study Management Group for their invaluable contributions to this research. We would also like to acknowledge the contribution to this study from central cancer registries supported through the Centers for Disease Control and Prevention National Program of Cancer Registries, and cancer registries supported by the National Cancer Institute Surveillance Epidemiology and End Results program. EPIC: The coordination of EPIC is financially supported by the European Commission (DG-SANCO) and the International Agency for Research on Cancer. The national cohorts are supported by the Danish Cancer Society (Denmark); the Deutsche Krebshilfe, Deutsches Krebsforschungszentrum and Federal Ministry of Education and Research (Germany); the Hellenic Health Foundation, Greek Ministry of Health; Greek Ministry of Education (Greece); the Italian Association for Research on Cancer (AIRC) and National Research Council (Italy); the Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF); the Statistics Netherlands (The Netherlands); the Health Research Fund (FIS), Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, Spanish Ministry of Health ISCIII RETIC (RD06/0020), Red de Centros RCESP, C03/09 (Spain); the Swedish Cancer Society, Swedish Scientific Council and Regional Government of Skåne and Västerbotten, Fundacion Federico SA (Sweden); the Cancer Research UK, Medical Research Council (United Kingdom). EPICAP: The EPICAP study was supported by grants from Ligue Nationale Contre le Cancer, Ligue départementale du Val de Marne; Fondation de France; Agence Nationale de sécurité sanitaire de l'alimentation, de l'environnement et du travail (ANSES). The EPICAP study group would like to thank all urologists, Antoinette Anger and Hasina Randrianasolo (study monitors), Anne-Laure Astolfi, Coline Bernard, Oriane Noyer, Marie-Hélène De Campo, Sandrine Margaroline, Louise N'Diaye, and Sabine Perrier-Bonnet (Clinical Research nurses). ERSPC: This study was supported by the DutchCancerSociety (KWF94-869,98-1657,2002-277,2006-3518, 2010-4800), The Netherlands Organisation for Health Research and Development (ZonMW-002822820, 22000106, 50-50110-98-311, 62300035), The Dutch Cancer Research Foundation (SWOP), and an unconditional grant from Beckman-Coulter-HybritechInc. ESTHER: The ESTHER study was supported by a grant from the Baden Württemberg Ministry of Science, Research and Arts. The ESTHER group would like to thank Hartwig Ziegler, Sonja Wolf, Volker Hermann, Heiko Müller, Karina Dieffenbach, Katja Butterbach for valuable contributions to the study. FHCRC: The FHCRC studies were supported by grants R01-CA056678, R01-CA082664, and R01-CA092579 from the US National Cancer Institute, National Institutes of Health, with additional support from the Fred Hutchinson Cancer Research Center. FHCRC would like to thank all the men who participated in these studies. Gene-PARE: The Gene-PARE study was supported by grants 1R01CA134444 from the U.S. National Institutes of Health, PC074201 and W81XWH-15-1-0680 from the Prostate Cancer Research Program of the Department of Defense and RSGT-05-200-01-CCE from the American Cancer Society. Hamburg-Zagreb: None reported HPFS: The Health Professionals Follow-up Study was supported by grants UM1CA167552, CA133891, CA141298, and P01CA055075. HPFS are grateful to the participants and staff of the Physicians' Health Study and Health Professionals Follow-Up Study for their valuable contributions, as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, and WY. IMPACT: The IMPACT study was funded by The Ronald and Rita McAulay Foundation, CR-UK Project grant (C5047/A1232), Cancer Australia, AICR Netherlands A10-0227, Cancer Australia and Cancer Council Tasmania, NIHR, EU Framework 6, Cancer Councils of Victoria and South Australia, and Philanthropic donation to Northshore University Health System. We acknowledge support from the National Institute for Health Research (NIHR) to the Biomedical Research Centre at The Institute of Cancer Research and Royal Marsden Foundation NHS Trust. IMPACT acknowledges the IMPACT study steering committee, collaborating centres, and participants. IPO-Porto: The IPO-Porto study was funded by Fundaçäo para a Ciência e a Tecnologia (FCT; UID/DTP/00776/2013 and PTDC/DTP-PIC/1308/2014) and by IPO-Porto Research Center (CI-IPOP-16-2012 and CI-IPOP-24-2015). MC and MPS are research fellows from Liga Portuguesa Contra o Cancro, Núcleo Regional do Norte. SM is a research fellow from FCT (SFRH/BD/71397/2010). IPO-Porto would like to express our gratitude to all patients and families who have participated in this study. Karuprostate: The Karuprostate study was supported by the the Frech National Health Directorate and by the Association pour la Recherche sur les Tumeurs de la ProstateKarusprostate thanks Séverine Ferdinand. KULEUVEN: F.C. and S.J. are holders of grants from FWO Vlaanderen (G.0684.12N and G.0830.13N), the Belgian federal government (National Cancer Plan KPC_29_023), and a Concerted Research Action of the KU Leuven (GOA/15/017). TVDB is holder of a doctoral fellowship of the FWO. LAAPC: This study was funded by grant R01CA84979 (to S.A. Ingles) from the National Cancer Institute, National Institutes of Health. Malaysia: The study was funded by the University Malaya High Impact Research Grant (HIR/MOHE/MED/35). Malaysia thanks all associates in the Urology Unit, University of Malaya, Cancer Research Initiatives Foundation (CARIF) and the Malaysian Men's Health Initiative (MMHI). MCCS: MCCS cohort recruitment was funded by VicHealth and Cancer Council Victoria. The MCCS was further supported by Australian NHMRC grants 209057, 251553, and 504711, and by infrastructure provided by Cancer Council Victoria. Cases and their vital status were ascertained through the Victorian Cancer Registry (VCR) and the Australian Institute of Health and Welfare (AIHW), including the National Death Index and the Australian Cancer Database. MCC-Spain: The study was partially funded by the Accion Transversal del Cancer, approved on the Spanish Ministry Council on the 11th October 2007, by the Instituto de Salud Carlos III-FEDER (PI08/1770, PI09/00773-Cantabria, PI11/01889-FEDER, PI12/00265, PI12/01270, and PI12/00715), by the Fundación Marqués de Valdecilla (API 10/09), by the Spanish Association Against Cancer (AECC) Scientific Foundation and by the Catalan Government DURSI grant 2009SGR1489. Samples: Biological samples were stored at the Parc de Salut MAR Biobank (MARBiobanc; Barcelona) which is supported by Instituto de Salud Carlos III FEDER (RD09/0076/00036). Also sample collection was supported by the Xarxa de Bancs de Tumors de Catalunya sponsored by Pla Director d'Oncologia de Catalunya (XBTC). MCC-Spain acknowledges the contribution from Esther Gracia-Lavedan in preparing the data. We thank all the subjects who participated in the study and all MCC-Spain collaborators. MD Anderson: Prostate Cancer Case-Control Studies at MD Anderson (MDA) supported by grants CA68578, ES007784, DAMD W81XWH-07-1-0645, and CA140388. MDACC_AS: None reported MEC: Funding provided by NIH grant U19CA148537 and grant U01CA164973. MIAMI (WFPCS): ACS MOFFITT: The Moffitt group was supported by the US National Cancer Institute (R01CA128813, PI: J.Y. Park). NMHS: Funding for the Nashville Men's Health Study (NMHS) was provided by the National Institutes of Health Grant numbers: RO1CA121060. PCaP only data: The North Carolina - Louisiana Prostate Cancer Project (PCaP) is carried out as a collaborative study supported by the Department of Defense contract DAMD 17-03-2-0052. For HCaP-NC follow-up data: The Health Care Access and Prostate Cancer Treatment in North Carolina (HCaP-NC) study is carried out as a collaborative study supported by the American Cancer Society award RSGT-08-008-01-CPHPS. For studies using both PCaP and HCaP-NC follow-up data please use: The North Carolina - Louisiana Prostate Cancer Project (PCaP) and the Health Care Access and Prostate Cancer Treatment in North Carolina (HCaP-NC) study are carried out as collaborative studies supported by the Department of Defense contract DAMD 17-03-2-0052 and the American Cancer Society award RSGT-08-008-01-CPHPS, respectively. For any PCaP data, please include: The authors thank the staff, advisory committees and research subjects participating in the PCaP study for their important contributions. For studies using PCaP DNA/genotyping data, please include: We would like to acknowledge the UNC BioSpecimen Facility and LSUHSC Pathology Lab for our DNA extractions, blood processing, storage and sample disbursement (https://genome.unc.edu/bsp). For studies using PCaP tissue, please include: We would like to acknowledge the RPCI Department of Urology Tissue Microarray and Immunoanalysis Core for our tissue processing, storage and sample disbursement. For studies using HCaP-NC follow-up data, please use: The Health Care Access and Prostate Cancer Treatment in North Carolina (HCaP-NC) study is carried out as a collaborative study supported by the American Cancer Society award RSGT-08-008-01-CPHPS. The authors thank the staff, advisory committees and research subjects participating in the HCaP-NC study for their important contributions. For studies that use both PCaP and HCaP-NC, please use: The authors thank the staff, advisory committees and research subjects participating in the PCaP and HCaP-NC studies for their important contributions. PCMUS: The PCMUS study was supported by the Bulgarian National Science Fund, Ministry of Education and Science (contract DOO-119/2009; DUNK01/2-2009; DFNI-B01/28/2012) with additional support from the Science Fund of Medical University - Sofia (contract 51/2009; 8I/2009; 28/2010). PHS: The Physicians' Health Study was supported by grants CA34944, CA40360, CA097193, HL26490, and HL34595. PHS members are grateful to the participants and staff of the Physicians' Health Study and Health Professionals Follow-Up Study for their valuable contributions, as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, and WY. PLCO: This PLCO study was supported by the Intramural Research Program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIHPLCO thanks Drs. Christine Berg and Philip Prorok, Division of Cancer Prevention at the National Cancer Institute, the screening center investigators and staff of the PLCO Cancer Screening Trial for their contributions to the PLCO Cancer Screening Trial. We thank Mr. Thomas Riley, Mr. Craig Williams, Mr. Matthew Moore, and Ms. Shannon Merkle at Information Management Services, Inc., for their management of the data and Ms. Barbara O'Brien and staff at Westat, Inc. for their contributions to the PLCO Cancer Screening Trial. We also thank the PLCO study participants for their contributions to making this study possible. Poland: None reported PROCAP: PROCAP was supported by the Swedish Cancer Foundation (08-708, 09-0677). PROCAP thanks and acknowledges all of the participants in the PROCAP study. We thank Carin Cavalli-Björkman and Ami Rönnberg Karlsson for their dedicated work in the collection of data. Michael Broms is acknowledged for his skilful work with the databases. KI Biobank is acknowledged for handling the samples and for DNA extraction. We acknowledge The NPCR steering group: Pär Stattin (chair), Anders Widmark, Stefan Karlsson, Magnus Törnblom, Jan Adolfsson, Anna Bill-Axelson, Ove Andrén, David Robinson, Bill Pettersson, Jonas Hugosson, Jan-Erik Damber, Ola Bratt, Göran Ahlgren, Lars Egevad, and Roy Ehrnström. PROGReSS: The PROGReSS study is founded by grants from the Spanish Ministry of Health (INT15/00070; INT16/00154; FIS PI10/00164, FIS PI13/02030; FIS PI16/00046); the Spanish Ministry of Economy and Competitiveness (PTA2014-10228-I), and Fondo Europeo de Desarrollo Regional (FEDER 2007-2013). ProMPT: Founded by CRUK, NIHR, MRC, Cambride Biomedical Research Centre ProtecT: Founded by NIHR. ProtecT and ProMPT would like to acknowledge the support of The University of Cambridge, Cancer Research UK. Cancer Research UK grants (C8197/A10123) and (C8197/A10865) supported the genotyping team. We would also like to acknowledge the support of the National Institute for Health Research which funds the Cambridge Bio-medical Research Centre, Cambridge, UK. We would also like to acknowledge the support of the National Cancer Research Prostate Cancer: Mechanisms of Progression and Treatment (PROMPT) collaborative (grant code G0500966/75466) which has funded tissue and urine collections in Cambridge. We are grateful to staff at the Welcome Trust Clinical Research Facility, Addenbrooke's Clinical Research Centre, Cambridge, UK for their help in conducting the ProtecT study. We also acknowledge the support of the NIHR Cambridge Biomedical Research Centre, the DOH HTA (ProtecT grant), and the NCRI/MRC (ProMPT grant) for help with the bio-repository. The UK Department of Health funded the ProtecT study through the NIHR Health Technology Assessment Programme (projects 96/20/06, 96/20/99). The ProtecT trial and its linked ProMPT and CAP (Comparison Arm for ProtecT) studies are supported by Department of Health, England; Cancer Research UK grant number C522/A8649, Medical Research Council of England grant number G0500966, ID 75466, and The NCRI, UK. The epidemiological data for ProtecT were generated though funding from the Southwest National Health Service Research and Development. DNA extraction in ProtecT was supported by USA Dept of Defense award W81XWH-04-1-0280, Yorkshire Cancer Research and Cancer Research UK. The authors would like to acknowledge the contribution of all members of the ProtecT study research group. The views and opinions expressed therein are those of the authors and do not necessarily reflect those of the Department of Health of England. The bio-repository from ProtecT is supported by the NCRI (ProMPT) Prostate Cancer Collaborative and the Cambridge BMRC grant from NIHR. We thank the National Institute for Health Research, Hutchison Whampoa Limited, the Human Research Tissue Bank (Addenbrooke's Hospital), and Cancer Research UK. PROtEuS: PROtEuS was supported financially through grants from the Canadian Cancer Society (13149, 19500, 19864, 19865) and the Cancer Research Society, in partnership with the Ministère de l'enseignement supérieur, de la recherche, de la science et de la technologie du Québec, and the Fonds de la recherche du Québec - Santé.PROtEuS would like to thank its collaborators and research personnel, and the urologists involved in subjects recruitment. We also wish to acknowledge the special contribution made by Ann Hsing and Anand Chokkalingam to the conception of the genetic component of PROtEuS. QLD: The QLD research is supported by The National Health and Medical Research Council (NHMRC) Australia Project Grants (390130, 1009458) and NHMRC Career Development Fellowship and Cancer Australia PdCCRS funding to J Batra. The QLD team would like to acknowledge and sincerely thank the urologists, pathologists, data managers and patient participants who have generously and altruistically supported the QLD cohort. RAPPER: RAPPER is funded by Cancer Research UK (C1094/A11728; C1094/A18504) and Experimental Cancer Medicine Centre funding (C1467/A7286). The RAPPER group thank Rebecca Elliott for project management. SABOR: The SABOR research is supported by NIH/NCI Early Detection Research Network, grant U01 CA0866402-12. Also supported by the Cancer Center Support Grant to the Cancer Therapy and Research Center from the National Cancer Institute (US) P30 CA054174. SCCS: SCCS is funded by NIH grant R01 CA092447, and SCCS sample preparation was conducted at the Epidemiology Biospecimen Core Lab that is supported in part by the Vanderbilt-Ingram Cancer Center (P30 CA68485). Data on SCCS cancer cases used in this publication were provided by the Alabama Statewide Cancer Registry; Kentucky Cancer Registry, Lexington, KY; Tennessee Department of Health, Office of Cancer Surveillance; Florida Cancer Data System; North Carolina Central Cancer Registry, North Carolina Division of Public Health; Georgia Comprehensive Cancer Registry; Louisiana Tumor Registry; Mississippi Cancer Registry; South Carolina Central Cancer Registry; Virginia Department of Health, Virginia Cancer Registry; Arkansas Department of Health, Cancer Registry, 4815 W. Markham, Little Rock, AR 72205. The Arkansas Central Cancer Registry is fully funded by a grant from National Program of Cancer Registries, Centers for Disease Control and Prevention (CDC). Data on SCCS cancer cases from Mississippi were collected by the Mississippi Cancer Registry which participates in the National Program of Cancer Registries (NPCR) of the Centers for Disease Control and Prevention (CDC). The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the CDC or the Mississippi Cancer Registry. SCPCS: SCPCS is funded by CDC grant S1135-19/19, and SCPCS sample preparation was conducted at the Epidemiology Biospecimen Core Lab that is supported in part by the Vanderbilt-Ingram Cancer Center (P30 CA68485). SEARCH: SEARCH is funded by a program grant from Cancer Research UK (C490/A10124) and supported by the UK National Institute for Health Research Biomedical Research Centre at the University of Cambridge. SNP_Prostate_Ghent: The study was supported by the National Cancer Plan, financed by the Federal Office of Health and Social Affairs, Belgium. SPAG: Wessex Medical ResearchHope for Guernsey, MUG, HSSD, MSG, Roger Allsopp STHM2: STHM2 was supported by grants from The Strategic Research Programme on Cancer (StratCan), Karolinska Institutet; the Linné Centre for Breast and Prostate Cancer (CRISP, number 70867901), Karolinska Institutet; The Swedish Research Council (number K2010-70X-20430-04-3) and The Swedish Cancer Society (numbers 11-0287 and 11-0624); Stiftelsen Johanna Hagstrand och Sigfrid Linnérs minne; Swedish Council for Working Life and Social Research (FAS), number 2012-0073STHM2 acknowledges the Karolinska University Laboratory, Aleris Medilab, Unilabs and the Regional Prostate Cancer Registry for performing analyses and help to retrieve data. Carin Cavalli-Björkman and Britt-Marie Hune for their enthusiastic work as research nurses. Astrid Björklund for skilful data management. We wish to thank the BBMRI.se biobank facility at Karolinska Institutet for biobank services. PCPT & SELECT are funded by Public Health Service grants U10CA37429 and 5UM1CA182883 from the National Cancer Institute. SWOG and SELECT thank the site investigators and staff and, most importantly, the participants who donated their time to this trial. TAMPERE: The Tampere (Finland) study was supported by the Academy of Finland (251074), The Finnish Cancer Organisations, Sigrid Juselius Foundation, and the Competitive Research Funding of the Tampere University Hospital (X51003). The PSA screening samples were collected by the Finnish part of ERSPC (European Study of Screening for Prostate Cancer). TAMPERE would like to thank Riina Liikanen, Liisa Maeaettaenen and Kirsi Talala for their work on samples and databases. UGANDA: None reported UKGPCS: UKGPCS would also like to thank the following for funding support: The Institute of Cancer Research and The Everyman Campaign, The Prostate Cancer Research Foundation, Prostate Research Campaign UK (now Prostate Action), The Orchid Cancer Appeal, The National Cancer Research Network UK, The National Cancer Research Institute (NCRI) UK. We are grateful for support of NIHR funding to the NIHR Biomedical Research Centre at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust. UKGPCS should also like to acknowledge the NCRN nurses, data managers, and consultants for their work in the UKGPCS study. UKGPCS would like to thank all urologists and other persons involved in the planning, coordination, and data collection of the study. ULM: The Ulm group received funds from the German Cancer Aid (Deutsche Krebshilfe). WUGS/WUPCS: WUGS would like to thank the following for funding support: The Anthony DeNovi Fund, the Donald C. McGraw Foundation, and the St. Louis Men's Group Against Cancer.
The Gabriella Miller Kids First Pediatric Research Program (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). All of the WGS and phenotypic data from this study are accessible through dbGaP and kidsfirstdrc.org, where other Kids First datasets can also be accessed. Children with disseminated neuroblastoma have a very high risk of treatment failure and death despite receiving intensified chemotherapy, radiation therapy and immunotherapy. The long-term goal of our research program is to ultimately improve neuroblastoma cure rates by first comprehensively defining the genetic basis of the disease. The central hypothesis to be tested here is that neuroblastoma arises largely due to the epistatic interaction of common and rare heritable DNA variation. Here we will perform a comprehensive whole genome sequencing of 563 quartets of neuroblastoma patient germline and diagnostic tumor DNAs and germline DNAs from both parents. The case series was recently collected through a Children's Oncology Group epidemiology clinical trial and is robustly annotated with complete demographic (age, sex, race, ethnicity), clinical (e.g. age at diagnosis, stage, risk group), epidemiologic (parental dietary and exposure questionnaire) and biological (e.g. tumor MYCN status and multiple other tumor genomic measures) co-variates. Subjects were consented for genetic research and DNA is immediately available for shipment for sequencing. We propose Illumina-based whole genome sequencing in the 593 "trio" germline samples (Aim 1; due to missing parent: 487 full neuroblastoma triads, 106 child-single parent dyads = 1673 whole genome sequences) and matched diagnostic tumor DNA (Aim 2; N=366) at 30x sequencing depth (N=2039 whole genome sequences). Also in Aim 2 we will perform whole exome (100x) and RNA sequencing on the 366 tumor DNA and 228 tumor RNA samples from this cohort. Finally, we propose a pilot study of structural variation using long-range sequencing in 10 non-overlapping tumor samples chosen based on potentially relevant chromosomal alterations discovered with conventional NGS. Thus, a total of 2277 individual samples and 2655 sequences will be generated. We will use our established analytic pipeline that is currently being used to study the germline genomes of all cases sequenced through the NCI supported Therapeutically Applicable Research to Generate Effective Treatments program. We plan a three-stage analytic approach, first focusing on classic de novo and inherited Mendelian damaging alterations. We will next integrate our extensive epigenomic data from human neuroblastoma cell lines and genome-wide association study data (N=5,703 neuroblastoma cases to date) to guide a comprehensive assessment of noncoding variants that influence tumor initiation with a recently established analytic pipeline. Finally, we will utilize the tumor DNA analyses to inform relevance via somatic gain or loss of function effects at the sequence and/or copy number levels. All data generated in this project will be immediately placed into the Genomic Data Commons (GDC) and we will compute within this environment by importing our analytic pipelines into the GDC. These data will be fully integrated into the Kids First Data Resource and freely shared with all academically qualified petitioners. This comprehensive data set derived from a large and richly phenotyped series of neuroblastoma DNA quartets will be integrated with existing germline and/or tumor genomic data from over 6,000 neuroblastoma subjects (but none with matched patient-parent germline sequencing data) to provide an unparalleled opportunity to comprehensively discover the genetic basis of neuroblastoma.
A genome-wide association study (GWAS) of prostate cancer (PCa) was conducted in Kaiser Permanente (KP) Northern California health plan members (7,783 cases, 38,595 controls; 80.3% non-Hispanic white, 4.9% African-American, 7.0% East Asian, and 7.8% Latino) [PMID: 26034056]. The data for these members were drawn from three KP cohort studies: Research Program in Genes, Environment and Health (RPGEH) ProHealth, and California Men's Health Study (CMHS) (described further under Study History). Four custom arrays were designed for genotyping, one for each of the four major race-ethnicity groups in the RPGEH cohort: African Americans, East Asians, Latinos, and Non-Hispanic Whites. The number of SNPs and SNP content varied by array, with SNP content designed to maximize the genome-wide coverage of low frequency and more common variants specific to the different race-ethnicity groups, including newly identified SNPs from sequencing projects, and SNPs with established associations with disease phenotypes and risk factors [PMIDs: 21565264, 21903159]. Within the total study cohort, n=34,736 completed a consent which permitted deposition of data to NIH. Genotyping followed the same general procedure described in [PMIDs: 26092718, plus additional quality control (QC) steps for the additional men, in order to control for potential batch and kit effects, described in [PMID: 26034056. Briefly, we first repeated the filters described in [PMID: 26092718] for all four arrays (EUR, LAT, EAS, AFR). Then, on an array-wise basis, we removed SNPs with MAF<0.01, with a call rate<95%, or with Hardy-Weinberg Equilibrium (HWE) p-value in homogeneous groups<1x10ˆ-5. Furthermore, on the EUR array, to adjust for potential kit effect, we conducted a GWAS of kit, and removed those kit associated SNPs with p<1x10ˆ-6; we also re-genotyped each of the new samples (those not genotyped with the original GERA data) with some of the original GERA data, and removed SNPs with >13/1,268 (1%) mismatches. For the AFR array, to adjust potential plate batch issues, we conducted a GWAS of whether an individual was in the original GERA data vs. in the newly genotyped data and removed those batch-associated SNPs with p<0.05 (we used a stronger threshold than that used for the EUR array because there were fewer individuals on the AFR array); we also re-genotyped each of the new samples with the original GERA data and removed SNPs with >2/78 (2.6%). After the QC described above, imputation was performed as described in [PMID: 26034056]. Imputation was performed on an array-wise basis, pre-phasing with SHAPE-IT v2.5 [PMID: 22138821], and imputing from the 1000 Genomes Project October 2014 release as a cosmopolitan reference panel with IMPUTE2 [PMID: 22384356]. In addition to the GWAS described above, a nested exome-wide association study (EWAS) of PCa was also conducted (7,489 cases, 7,323 controls; 78% non-Hispanic white, 9% African-American, 3% East Asian, 6% Latino, 4% Other). A custom EWAS array primarily focused on rare variants was designed for genotyping that complemented the GWAS arrays [PMID: 26034056]. The EWAS array content included missense and loss-of-function mutations, and rare exonic mutations from The Cancer Genome Atlas (TCGA) and dbGaP prostate cancer tumor exomes [PMID: 26544944; PMID: 26544944]. Much of the EWAS array design content overlapped with the probesets on the UK Biobank Affymetrix Axiom array [PMID: 30305743]. Genotyping and QC steps taken to filter out samples exhibiting low quality and variants with low call rates are described in Emami et al., 2020 [biorXiv]. The resulting EWAS array genotypes are provided here.
GenADA is a multi-site collaborative study, involving GlaxoSmithKline Inc and nine medical centres in Canada, to develop a dataset containing 1000 Alzheimer's disease patients and 1000 ethnically-matched controls in order to associate DNA sequence (allelic) variations in candidate genes with Alzheimer's disease phenotypes. The study consists of both retrospective and prospective components, that is, patients with an existing diagnosis of Alzheimer's disease as well as newly diagnosed patients were enrolled in the study. Thus, clinical data was retrospectively or prospectively obtained on Day 1 of entry into GenADA. Where possible, biological relatives with Alzheimer's (up to third degree relationship such as cousins) and unaffected siblings of AD cases were also recruited. Note that recruitment numbers for biological relatives were lower than expected and genotypic data has not been submitted to dbGap for these subjects. The purpose of this study is: To identify DNA sequence variations (genotype) in candidate genes that are associated with the clinical symptoms and behavioural features of Alzheimer's disease (phenotype), which differ between study participants with and without the disease. To identify other genotype-phenotype associations in cognitively impaired study participants such as age of onset, family history, rate of cognitive decline, patterns of behavioural/psychiatric non-cognitive symptoms factors, response to treatment co-morbid conditions, and risks/exposure. The final subject recruitment for this study included 875 Alzheimer's disease patients, 850 ethnically-matched controls, and 37 family members. GenADA LONG is a longitudinal assessment to the original GenADA study. Eligible subjects were recruited from five of the nine memory clinics that participated in the GenADA study. Mild to moderate AD participants, a matched subset of controls, and biological related siblings (both affected and unaffected) or other blood relatives affected with AD, were initially examined a minimum of 12 months from recruitment into the original GenADA study, then at two further intervals of 12 and 18 months after time of entry into GenADA LONG. This enables an evaluation of the disease progression in AD patients and a determination of whether controls show evidence of cognitive decline. The overall goal of this extension study is to identify genetic differences and environmental influences that modulate the age of onset of the disease, the course of the disease, and/or biomarkers for neurodegenerative processes. Three of the five memory clinics that participated in the GenADA LONG study recruited eligible patients into GenADA Imaging. A concurrent neuroimaging sub-study was conducted at three of the five memory clinics participating in GenADA LONG. Eligible AD cases with mild to moderate AD, who were recruited into the original GenADA study and participated in the GenADA LONG extension study, were enrolled. Additionally, controls that showed signs of cognitive decline, as part of the assessment in GenADA LONG, were imaged at baseline, 12 and 18 month scanning intervals. The objective of this study is to find genes that: affect changes in AD brain volume measure by magnetic resonance in order to investigate how well change in brain volume predicts other key clinical measures in AD, such as neurodegenerative scales; that correlate changes in brain volume for other genotype-phenotype associations in cognitively impaired study participants; and that correlate with other clinically applicable magnetic resonance measures of pathology that can be conducted at the same time as structural volume measures, and are complementary to the volume measures. The ultimate aim of this research is to obtain a better understanding and definition of Alzheimer's Disease in order to develop new improved medicines.
Data on transgenerational effects following nuclear accidents are important for understanding fully the consequences of parental exposure to ionizing radiation. Few studies to date have had adequate statistical power to detect effects of the magnitude expected based on animal data, and most have not been of low-dose, protracted exposures associated with nuclear accidents and their aftermath. Although, to date, scant use has been made of the new genomic technologies, in Chernobyl-exposed areas of Ukraine and Belarus, excess minisatellite mutations have been seen in children born after the accident. We propose a study of parent-child trios in which at least one parent was exposed to Chernobyl radiation as a clean-up worker (mean dose>=100 mGy) and/or evacuee from a contaminated area (mean >=50 mGy). The specific aims are to investigate the transgenerational and de novo mutation rates of the spectrum of genetic variants in trios, in particular looking at effects in children and mapping them to possible parental origin of the chromsoome. Together with long-term collaborators at the Research Center for Radiation Medicine (RCRM) in Kiev, epidemiologic data will be collected for up to 450 trios of parents with preconceptional doses and their unexposed offspring. We will use state-of-the-art genomic technologies to characterize the landscape of the genomes of the trios to determine whether parental radiation exposure is associated with genetic mutations transmitted to the offspring, by examining de novo mutation rates, minisatellite mutations, copy number alterations, and variations in telomere length. The analysis will be conducted in peripheral blood and/or buccal samples (when blood is not available) from complete father-mother-child trios. Doses to the gonads from the time of the accident to the time of conception will be reconstructed for all parents using existing records supplemented by interview data. Trio subjects will be selected from representative populations exposed to radiation from Chernobyl who are under active follow-up in the Clinico-Epidemiologic Registry at RCRM. To help identify specific effects of paternal and maternal radiation exposure, we will initially select sets of trio subjects in five categories: (1) exposed father, unexposed mother; (2) unexposed father, exposed mother; (3) both parents exposed; (4) both parents unexposed; and (5) a group of high dose "emergency workers" with acute radiation syndrome. All trio members will be invited to the RCRM outpatient clinic for collection of a 20 ml blood sample (or buccal cells for those who refuse phlebotomy). Both parents will be asked to complete a general questionnaire to obtain demographic and lifestyle data. Then one or both will complete detailed dosimetry questionnaires, based on forms used in previous collaborations with RCRM and administered by specially trained interviewers. Once 50 trios have been recruited (10 from each of the 5 exposure categories), we will conduct an interim evaluation of participation rates, sample collection and quality, and dose reconstruction in order to modify the protocol as needed. The analytical approach will be to correlate the extent, especially for de novo events of genetic alterations in the offspring with parental pre-conceptional radiation dose overall and by parental origin. The statistical power in relation to de novo mutations is very high, in excess of 90%, but somewhat lower for trends in minisatellite mutations. Study findings will contribute importantly to knowledge of the heritable effects of moderate- and low-dose radiation exposure in humans and to radiation risk projection. Eventually data from the Trio Study may be shared with the international community through dbGap.
Multiple Myeloma (MM) is a plasma cell dyscrasia characterized by bone marrow (BM) infiltration and lytic bone lesions. Recent studies of massive parallel sequencing of tumor cells obtained from the BM of patients with MM have demonstrated significant clonal heterogeneity in MM. Despite this remarkable clonal heterogeneity, it could be envisioned that such clonal diversity may be even higher since single BM samples only represent a small fraction of the whole BM compartment, and the pattern of BM infiltration in MM is typically patchy. In addition, BM biopsies are painful and cannot be repeated multiple times during the course of therapy, indicating a need for less invasive methods to molecularly characterize MM patients and monitor disease progression during the therapy. Thus, optimal characterization of circulating tumor cells (CTCs) may represent a non-invasive method to capture relevant mutations present in PC clones. In addition, MM almost always progresses from precursor states of monoclonal gammopathy of undetermined significance (MGUS)/smoldering multiple myeloma (SMM) to overt MM. However, some patients rapidly progress from MGUS/SMM to overt MM (progressors) with a rate of progression of up to 70% over 5 years, while others remain indolent with minimal progression over the same time period (non-progressors). Although many patients are diagnosed with earlier phases of disease, most patients do not receive treatment until their disease progresses, at which time they have overt end-organ damage. This concept of initiating therapy at the time of symptomatic disease is analogous to initiating therapy in patients with solid tumors only after the development of measurable metastatic disease. It is therefore not surprising that cure is not achieved for most patients with MM. Interestingly, studies have demonstrated that MGUS/SMM clones may already harbor chromosomal alterations (Ig loci or hyperdiploidy) and that progression to MM is mainly due to expansion of clones that were already present in the early stages of MGUS/SMM. However, the biological factors that discriminate progressors from non-progressors in MGUS/SMM are not well known. Therefore, our overarching hypothesis is that an effective therapeutic intervention will result from defining genomic and transcriptomic markers that are associated with disease progression. We believe, therefore, that focused research studies that define molecular mechanisms of clonal evolution in MGUS/SMM/MM will identify novel biomarkers of disease progression and help develop therapeutic agents that prevent or delay progression from MGUS to overt MM. Indeed, by eradicating the disease at the precursor stages, MM may become a preventable disease. Recently, a new term called Clonal Hematopoiesis of Indeterminate Potential (CHIP) has been proposed to describe asymptomatic individuals with hematologic malignancy-associated somatic mutations. Those individuals do not fulfill any diagnostic criteria for any hematological malignancy yet they have a tendency to progress into myelodysplastic syndrome (MDS) or myeloid or lymphoid neoplasia at a rate of around 0.5-1% per year, similar to MGUS. The frequency of CHIP and role of HSCs mutations in enhancing acquisition of somatic mutations in MM plasma cells, allowing progression following treatment, has not been studied. Investigating the dysregulated pathways in early progenitor cells would allow us to understand the reasons of progression and establish novel therapeutic and potentially preventive strategies. This study dissects genomic and transcriptomic characteristics of clonal evolution from MGUS/SMM to MM as well as the characteristics of the tumor microenvironment/immune cells/peripheral blood. Our hypothesis is that molecular biomarkers will be strong predictors of progression from MGUS/SMM to MM and will allow for the development of novel therapeutic agents that prevent or delay this progression. We aim to define genomic and transcriptomic markers that lead to progression from MGUS/SMM to MM in tumor cells, blood biopsies (cell free DNA and circulating tumor cells), and the tumor microenvironment.
The Gabriella Miller Kids First Pediatric Research Program) (Kids First) is a trans-NIH effort initiated in response to the 2014 Gabriella Miller Kids First Research Act and supported by the NIH Common Fund. This program focuses on gene discovery in pediatric cancers and structural birth defects and the development of the Gabriella Miller Kids First Pediatric Data Resource (Kids First Data Resource). Both childhood cancers and structural birth defects are critical and costly conditions associated with substantial morbidity and mortality. Elucidating the underlying genetic etiology of these diseases has the potential to profoundly improve preventative measures, diagnostics, and therapeutic interventions. All of the WGS and phenotypic data from this study are accessible through dbGaP and https://kidsfirstdrc.org, where other Kids First datasets can also be accessed. The Kids First study of nonsyndromic orofacial cleft (OFC) birth defects in Latin American families is a whole genome sequencing study of 283 Latin-American parent-case trios drawn from ongoing collaborations led by Dr. Mary L. Marazita of the University of Pittsburgh Center for Craniofacial and Dental Genetics, and including a collaboration with Dr. Lina Moreno Uribe and Dr. Andrew Lidral of the University of Iowa. All families were ascertained through the Clinica Noel where patients with OFCs receive care from the Antioquia University School of Dentistry in Medellin, Colombia (key on-site colleagues included Dr. Luz Consuelo Valencia-Ramirez and Dr. Mauricio Arcos-Burgos). Genetic studies have shown that this population is comprised of an admixture of immigrant male Caucasians (mainly Spaniards and Basques) and native Amerindian females. Every subject has had a genetic evaluation, including a pedigree analysis for a family history of clefting and other birth defects, a pregnancy history for environmental exposures, and a complete physical exam to rule out suspected or known syndromes or environmental phenocopies. Sequencing was done by the Broad Institute sequencing center funded by the Kids First program (grant number U24-HD090743). The case in each of the Kids First OFC trios has cleft lip (CL, Figure A below), cleft palate (CP, Figure B), or both (CL+CP, Figure C): OFCs are genetically complex structural birth defects caused by genetic factors, environmental exposures, and their interactions. OFCs are the most common craniofacial anomalies in humans, affecting approximately 1 in 700 newborns, and are one of the most common structural birth defects worldwide. On average a child with an OFC initially faces feeding difficulties, undergoes 6 surgeries, spends 30 days in hospital, receives 5 years of orthodontic treatment, and participates in ongoing speech therapy, leading to an estimated total lifetime treatment cost of about $200,000. Further, individuals born with an OFC have higher infant mortality, higher mortality rates at all other stages of life, increased incidence of mental health problems, and higher risk for other disorders (notably including breast, brain, and colon cancers). Prior genome-wide linkage and association studies have now identified at least 18 genomic regions likely to contribute to the risk for nonsyndromic OFCs. Despite this substantial progress, the functional/pathogenic variants at OFC-associated regions are mostly still unknown. Because previous OFC genomic studies (genome-wide linkage, genome-wide association studies (GWAS), and targeted sequencing) are based on relatively sparse genotyping data, they cannot distinguish between causal variants and variants in linkage disequilibrium with unobserved causal variants. Moreover, it is unknown whether the association or linkage signals are due to single common variants, haplotypes of multiple common variants, clusters of multiple rare variants, or some combination. Finally, we cannot yet attribute specific genetic risk to individual cases and case families. Therefore, the goal of the current study is identify specific OFC risk variants in Latin American families by performing whole genome sequencing of parent-case trios.
Objectives: Use genome-wide approaches to identify genetic variants that influence common thrombosis and hemostasis factors, as well as selected common human traits. Design/Methods: The GABC study was a prospective sibling cohort design. Siblings were recruited by targeted email to the undergraduate and graduate student email lists at the University of Michigan. Healthy persons between 14 and 35 years old who had healthy siblings within the same age restriction were able to participate. Study participants agreed to an online informed consent and subsequently completed a 52-question online survey describing their specific bleeding traits as well as many common human traits. Fifty milliliters of blood was collected into a citrate-dextrose solution (ACD) from each participant. An aliquot of whole blood was used for an automated complete blood count analysis and the remainder was processed into platelet poor plasma and buffy coat portions. Plasma and buffy coat aliquots were snap frozen and stored in liquid nitrogen for future studies. 1189 individuals representing 507 sibships were collected between 06/26/2006 and 01/30/2009. Phenotyping Survey Details: To characterize individual bruising and bleeding history, the online survey recorded answers to questions based on a modified von Willebrand Disease (VWD) screening questionnaire. To characterize a collection of participant's common human traits, the survey recorded answers to questions about height, weight, presence of skin tags, history of acne, eye color, hair color, hair line characteristics, skin sunburn sensitivity, skin tanning ability, natural skin color, freckling, cheek dimpling, earlobe shape, shoe size, foot arch characteristics, hand fifth digit morphology, history of dyslexia, history of migraine headaches, history of seasonal allergies, history of apthous ulcers, tendency to sneeze while walking into a bright sunny place, history of dental caries, need for corrective eye lenses, handedness and like or dislike of strongly flavored foods. Biochemical phenotyping: Assays for plasma Von Willebrand Factor (VWF) antigen were performed using ELISA and "Alphalisa" techniques. Automated complete blood count analysis was performed on a Bayer Advia 120 on all participants (including WBC differential, RBC indices, and platelet count.) For the dbGaP v2 update, new biochemical phenotypes have been submitted and include von Willebrand Factor, von Willebrand Factor propeptide, plasminogen, gamma prime fibrinogen, ADAMTS 13, antithrombin III, protein C, and protein S. All new phenotypes were obtained using "Alphalisa" techniques. Genotyping Details: SNP genotyping was performed using genomic DNA extracted from peripheral blood at the Broad Institute, (MIT/Harvard). Genotyping was performed on the Illumina Omni-1 quad chip at the Broad Institute. For the dbGaP v2 update, genotyping data from the Illumina Human Exome was deposited. This study is part of the Gene Environment Association Studies initiative (GENEVA, http://www.genevastudy.org) funded by the trans-NIH Genes, Environment, and Health Initiative (GEI). The overarching goal is to identify novel genetic factors that contribute to blood clotting through large-scale genome-wide association studies of siblings. Genotyping was performed at the Broad Institute of MIT and Harvard, a GENEVA genotyping center. Data cleaning and harmonization was performed by the primary investigators at the University of Michigan, Ann Arbor, and at the GEI-funded GENEVA Coordinating Center at the University of Washington. This study serves as a resource for investigators who are interested in the genetic determinants of specific plasma proteins in a healthy population. The sibling cohort design allows for linkage analysis in addition to association studies. Analysis of thrombosis and hemostasis related traits should help elucidate specific biochemical and genetic networks that maintain hemostasis. We hope to identify specific genetic determinants of VWF levels in order to better understand the factors that influence the development of VWD.