ETV6::RUNX1 (E::R) subgroup is the second most frequent form of B cell acute lymphoblastic leukemia (B-ALL) in childhood. This subgroup of ALL generally presents with low-risk features and responds well to chemotherapy with ensuing favorable prognosis. Some patients, however, have a slow response to induction treatment and carry increased risk for relapse.
Objectives We are sharing a database of dynamic magnetic resonance imaging (dMRI) scans of normal children, which can serve as a reference standard to quantify regional respiratory abnormalities in young patients with various respiratory conditions and facilitate treatment planning and response assessment. The database can also be useful to advance future AI-based research on image-based object segmentation and analysis. Background In pediatric patients with respiratory abnormalities, it is important to understand the alterations in regional dynamics of the lungs and other thoracoabdominal components, which in turn requires a quantitative understanding of what is considered as normal in healthy children. Currently, such a normative database of regional respiratory structure and function in healthy children does not exist. Participants 200 normal children (ages 6-18 years) participated in our research study related to this dataset. DesignThe shared open-source normative database is from our ongoing virtual growing child (VGC) project, which includes 4D dMRI images representing one breathing cycle for each normal child and also segmentations of 10 objects at end expiration (EE) and end inspiration (EI) phases of the respiratory cycle in the 4D image. The lung volumes at EE and EI as well as the excursion volumes of chest wall and diaphragm from EE to EI, left and right sides separately, are also reported. The database has thus 4,000 3D segmentations from 200 normal children in total. The database is unique and provides dMRI images, object segmentations, and quantitative regional respiratory measurement parameters of volumes for normal children. All dMRI scans are acquired from normal children during free-breathing. The dMRI acquisition protocol was as follows: 3T MRI scanner (Verio, Siemens, Erlangen, Germany), true-FISP bright-blood sequence, TR=3.82 ms, TE=1.91 ms, voxel size ~1×1×6 mm3, 320×320 matrix, bandwidth 258 Hz, and flip angle 76o. With recent advances, for each sagittal location across the thorax and abdomen, we acquired 40 2D slices over several tidal breathing cycles at ~480 ms/slice. On average, 35 sagittal locations are imaged, yielding a total of ~1400 2D MRI slices, with a resulting total scan time of 11-13 minutes for any particular study participant.The collected dMRI scan data then went through the procedure of 4D image construction, image processing, object segmentation, and volumetric measurements from segmentations. 4D image construction: For the acquired dMRI scans, we utilized an automated 4D image construction approach to form one 4D image over one breathing cycle (consisting of typically 5-8 respiratory phases) from each acquired dMRI scan to represent the whole dynamic thoraco-abdominal body region. The algorithm selects 175-280 slices (35 sagittal locations × 5-8 respiratory phases) from the 1400 acquired slices in an optimal manner using an optical flux method. Image processing: Intensity standardization is performed on every time point/3D volume of the 4D image so that image values have the same tissue-specific meaning across all subjects. Object segmentation: For each subject, there are 10 objects segmented at both EE and EI time points in this database. They include the thoracoabdominal skin outer boundary, left and right lungs, liver, spleen, left and right kidneys, diaphragm, and left and right hemi-diaphragms. All dMRI scans utilize large field of view images, which include the full thorax and abdomen to the inferior aspect of the kidneys in the sagittal plane. We used a pretrained U-Net based deep learning network to first segment all objects, and then all auto-segmentation results were visually checked and manually refined as needed, under the supervision of a radiologist with over 25 years of expertise in MRI and thoracoabdominal radiology. Manual segmentations have been performed for all objects in all datasets. Volumetric measurements based on object segmentations for lung volumes (left and right separately) at EE and EI, as well as for chest wall and diaphragm excursion volumes (left and right separately) are reported. ConclusionsThe provided database is unique and provides dMRI images, object segmentations, and quantitative regional respiratory measurement parameters of volumes for normal children. The database has 4,000 3D segmentations from 200 normal children, which to our knowledge is the largest and only such dMRI dataset to date. All images and object segmentations are saved in DICOM. All DICOM files (176,574 in total) have been anonymized, and PHI has been removed. The database can be used as a reference standard to quantify regional respiratory abnormalities in young patients with various respiratory conditions and facilitate treatment planning and response assessment. The large amount of object segmentations can potentially benefit AI-based research on image-based object segmentation and analysis.
Data Access NOTE: Please refer to the “Authorized Access” section below for information about how access to the data from this accession differs from many other dbGaP accessions. Available Data: The ROC Cardiac Epistry versions 1 and 2 include all cardiac arrest cases entered into the ROC database from December 1, 2005 to April 1, 2011. ROC Cardiac Epistry 3 includes cardiac arrest cases captured between 2011 and 2015, and introduced significant changes in how and which data were collected.For ROC traumatic injury Epistry data, please see: ROC-Trauma Epistry Objectives: To build a prospective population-based registry of participants with out-of-hospital cardiac arrest responded to by Emergency Medical Services (EMS). Specific aims: to establish whether the results of Resuscitation Outcomes Consortium (ROC) trials can be generalized to the larger population of participants that experience cardiac arrest;to more fully establish the burden of cardiac arrest; and to examine the relationships between variation in EMS structure and process, regional and periodic factors, and participant outcomes.Background: Cardiac arrest is a common, serious, debilitating and costly public health problem. Although there has been a steady decline in morbidity and mortality from most cardiovascular diseases, high mortality rates for out-of-hospital cardiac arrests continue to pose a challenge for healthcare providers and a significant public health burden. The Resuscitation Outcomes Consortium (ROC) was established in 2004 to conduct clinical research in the areas of cardiopulmonary arrest and life-threatening traumatic injury with the overall goal of improving resuscitation outcomes. Participant and care characteristics can predict favorable outcomes in cardiac arrests, but there is still a wide variation in outcomes that is not well understood. EMS factors such as service level, number of responding providers, use of procedures or drugs in the field, training, quality assurance/feedback, and response time intervals also vary significantly by region. Variations in geographic, socioeconomic and periodic factors may also be associated with differences in outcomes.Prior to ROC Cardiac Epistry, there were no North American population-based registries for out-of-hospital cardiac arrests. Therefore there was a need for standardized data collection of out-of-hospital cardiac arrests in diverse geographic locations in order to identify the independent effects of prognostic or treatment factors accounting for variations in survival. Participants: The registry included 109,326 cardiac arrest events from 264 EMS agencies transporting to 287 acute care hospitals from the following regional centers: Birmingham, Alabama; Dallas, Texas; Iowa City, Iowa; Milwaukee, Wisconsin; Pittsburgh, Pennsylvania; Portland, Oregon; San Diego, California; Seattle/King County, Washington; Ottawa, Ontario; Toronto, Ontario; and Vancouver, British Columbia.Design: ROC Epistry collected standardized data regarding episode-specific factors, participant demographics, clinical information, pre-hospital interventions and disposition, hospital information, and participant outcome for all out-of-hospital cardiac arrests in the ROC regions. Each ROC site had to ensure capture of all eligible cases within the EMS service areas. Out-of-hospital data were extracted from existing databases whenever possible and augmented with targeted review of EMS reports. Hospital data were abstracted directly from the hospital file in most cases. Alternative methods included linkage to death registries and obituaries if the death occurred within 30 days. Sites submitted data using a web-based interface or batch uploads. Participants were not contacted directly.
Inner Asia is particularly interesting to understand human history and evolution, as two groups presenting contrasting cultural traits (notably their language, their social organisation and matrimonial system) cohabit. We sampled 503 individuals from these two groups, belonging to 17 populations from 11 distinct ethnic groups (AltaiKizhi, Kazakh, Khakas, Kyrgyz, Mongolian, Shore, Tajik, Telengit, Tubalar, Turkmen and Uzbek). The samples were then genotyped with 5 different DNA-arrays and, after quality-control, the 253,532 autosomal SNPs present in all the arrays were merged together in the present dataset.
Original description of the study: From ELLIPSE (linked to the PRACTICAL consortium), we contributed ~78,000 SNPs to the OncoArray. A large fraction of the content was derived from the GWAS meta-analyses in European ancestry populations (overall and aggressive disease; ~27K SNPs). We also selected just over 10,000 SNPs from the meta-analyses in the non-European populations, with a majority of these SNPs coming from the analysis of overall prostate cancer in African ancestry populations as well as from the multiethnic meta-analysis. A substantial fraction of SNPs (~28,000) were also selected for fine-mapping of 53 loci not included in the common fine-mapping regions (tagging at r2>0.9 across ±500kb regions). We also selected a few thousand SNPs related with PSA levels and/or disease survival as well as SNPs from candidate lists provided by study collaborators, as well as from meta-analyses of exome SNP chip data from the Multiethnic Cohort and UK studies. The Contributing Studies: Aarhus: Hospital-based, Retrospective, Observational. Source of cases: Patients treated for prostate adenocarcinoma at Department of Urology, Aarhus University Hospital, Skejby (Aarhus, Denmark). Source of controls: Age-matched males treated for myocardial infarction or undergoing coronary angioplasty, but with no prostate cancer diagnosis based on information retrieved from the Danish Cancer Register and the Danish Cause of Death Register. AHS: Nested case-control study within prospective cohort. Source of cases: linkage to cancer registries in study states. Source of controls: matched controls from cohort ATBC: Prospective, nested case-control. Source of cases: Finnish male smokers aged 50-69 years at baseline. Source of controls: Finnish male smokers aged 50-69 years at baseline BioVu: Cases identified in a biobank linked to electronic health records. Source of cases: A total of 214 cases were identified in the VUMC de-identified electronic health records database (the Synthetic Derivative) and shipped to USC for genotyping in April 2014. The following criteria were used to identify cases: Age 18 or greater; male; African Americans (Black) only. Note that African ancestry is not self-identified, it is administratively or third-party assigned (which has been shown to be highly correlated with genetic ancestry for African Americans in BioVU; see references). Source of controls: Controls were identified in the de-identified electronic health record. Unfortunately, they were not age matched to the cases, and therefore cannot be used for this study. Canary PASS: Prospective, Multi-site, Observational Active Surveillance Study. Source of cases: clinic based from Beth Israel Deaconness Medical Center, Eastern Virginia Medical School, University of California at San Francisco, University of Texas Health Sciences Center San Antonio, University of Washington, VA Puget Sound. Source of controls: N/A CCI: Case series, Hospital-based. Source of cases: Cases identified through clinics at the Cross Cancer Institute. Source of controls: N/A CerePP French Prostate Cancer Case-Control Study (ProGene): Case-Control, Prospective, Observational, Hospital-based. Source of cases: Patients, treated in French departments of Urology, who had histologically confirmed prostate cancer. Source of controls: Controls were recruited as participating in a systematic health screening program and found unaffected (normal digital rectal examination and total PSA < 4 ng/ml, or negative biopsy if PSA > 4 ng/ml). COH: hospital-based cases and controls from outside. Source of cases: Consented prostate cancer cases at City of Hope. Source of controls: Consented unaffected males that were part of other studies where they consented to have their DNA used for other research studies. COSM: Population-based cohort. Source of cases: General population. Source of controls: General population CPCS1: Case-control - Denmark. Source of cases: Hospital referrals. Source of controls: Copenhagen General Population Study CPCS2: Source of cases: Hospital referrals. Source of controls: Copenhagen General Population Study CPDR: Retrospective cohort. Source of cases: Walter Reed National Military Medical Center. Source of controls: Walter Reed National Military Medical Center ACS_CPS-II: Nested case-control derived from a prospective cohort study. Source of cases: Identified through self-report on follow-up questionnaires and verified through medical records or cancer registries, identified through cancer registries or the National Death Index (with prostate cancer as the primary cause of death). Source of controls: Cohort participants who were cancer-free at the time of diagnosis of the matched case, also matched on age (±6 mo) and date of biospecimen donation (±6 mo). EPIC: Case-control - Germany, Greece, Italy, Netherlands, Spain, Sweden, UK. Source of cases: Identified through record linkage with population-based cancer registries in Italy, the Netherlands, Spain, Sweden and UK. In Germany and Greece, follow-up is active and achieved through checks of insurance records and cancer and pathology registries as well as via self-reported questionnaires; self-reported incident cancers are verified through medical records. Source of controls: Cohort participants without a diagnosis of cancer EPICAP: Case-control, Population-based, ages less than 75 years at diagnosis, Hérault, France. Source of cases: Prostate cancer cases in all public hospitals and private urology clinics of département of Hérault in France. Cases validation by the Hérault Cancer Registry. Source of controls: Population-based controls, frequency age matched (5-year groups). Quotas by socio-economic status (SES) in order to obtain a distribution by SES among controls identical to the SES distribution among general population men, conditionally to age. ERSPC: Population-based randomized trial. Source of cases: Men with PrCa from screening arm ERSPC Rotterdam. Source of controls: Men without PrCa from screening arm ERSPC Rotterdam ESTHER: Case-control, Prospective, Observational, Population-based. Source of cases: Prostate cancer cases in all hospitals in the state of Saarland, from 2001-2003. Source of controls: Random sample of participants from routine health check-up in Saarland, in 2000-2002 FHCRC: Population-based, case-control, ages 35-74 years at diagnosis, King County, WA, USA. Source of cases: Identified through the Seattle-Puget Sound SEER cancer registry. Source of controls: Randomly selected, age-frequency matched residents from the same county as cases Gene-PARE: Hospital-based. Source of cases: Patients that received radiotherapy for treatment of prostate cancer. Source of controls: n/a Hamburg-Zagreb: Hospital-based, Prospective. Source of cases: Prostate cancer cases seen at the Department of Oncology, University Hospital Center Zagreb, Croatia. Source of controls: Population-based (Croatia), healthy men, older than 50, with no medical record of cancer, and no family history of cancer (1st & 2nd degree relatives) HPFS: Nested case-control. Source of cases: Participants of the HPFS cohort. Source of controls: Participants of the HPFS cohort IMPACT: Observational. Source of cases: Carriers and non-carriers (with a known mutation in the family) of the BRCA1 and BRCA2 genes, aged between 40 and 69, who are undergoing prostate screening with annual PSA testing. This cohort has been diagnosed with prostate cancer during the study. Source of controls: Carriers and non-carriers (with a known mutation in the family) of the BRCA1 and BRCA2 genes, aged between 40 and 69, who are undergoing prostate screening with annual PSA testing. This cohort has not been diagnosed with prostate cancer during the study. IPO-Porto: Hospital-based. Source of cases: Early onset and/or familial prostate cancer. Source of controls: Blood donors Karuprostate: Case-control, Retrospective, Population-based. Source of cases: From FWI (Guadeloupe): 237 consecutive incident patients with histologically confirmed prostate cancer attending public and private urology clinics; From Democratic Republic of Congo: 148 consecutive incident patients with histologically confirmed prostate cancer attending the University Clinic of Kinshasa. Source of controls: From FWI (Guadeloupe): 277 controls recruited from men participating in a free systematic health screening program open to the general population; From Democratic Republic of Congo: 134 controls recruited from subjects attending the University Clinic of Kinshasa KULEUVEN: Hospital-based, Prospective, Observational. Source of cases: Prostate cancer cases recruited at the University Hospital Leuven. Source of controls: Healthy males with no history of prostate cancer recruited at the University Hospitals, Leuven. LAAPC: Subjects were participants in a population-based case-control study of aggressive prostate cancer conducted in Los Angeles County. Cases were identified through the Los Angeles County Cancer Surveillance Program rapid case ascertainment system. Eligible cases included African American, Hispanic, and non-Hispanic White men diagnosed with a first primary prostate cancer between January 1, 1999 and December 31, 2003. Eligible cases also had (a) prostatectomy with documented tumor extension outside the prostate, (b) metastatic prostate cancer in sites other than prostate, (c) needle biopsy of the prostate with Gleason grade ≥8, or (d) needle biopsy with Gleason grade 7 and tumor in more than two thirds of the biopsy cores. Eligible controls were men never diagnosed with prostate cancer, living in the same neighborhood as a case, and were frequency matched to cases on age (± 5 y) and race/ethnicity. Controls were identified by a neighborhood walk algorithm, which proceeds through an obligatory sequence of adjacent houses or residential units beginning at a specific residence that has a specific geographic relationship to the residence where the case lived at diagnosis. Malaysia: Case-control. Source of cases: Patients attended the outpatient urology or uro-onco clinic at University Malaya Medical Center. Source of controls: Population-based, age matched (5-year groups), ascertained through electoral register, Subang Jaya, Selangor, Malaysia MCC-Spain: Case-control. Source of cases: Identified through the urology departments of the participating hospitals. Source of controls: Population-based, frequency age and region matched, ascertained through the rosters of the primary health care centers MCCS: Nested case-control, Melbourne, Victoria. Source of cases: Identified by linkage to the Victorian Cancer Registry. Source of controls: Cohort participants without a diagnosis of cancer MD Anderson: Participants in this study were identified from epidemiological prostate cancer studies conducted at the University of Texas MD Anderson Cancer Center in the Houston Metropolitan area. Cases were accrued in the Houston Medical Center and were not restricted with respect to Gleason score, stage or PSA. Controls were identified via random-digit-dialing or among hospital visitors and they were frequency matched to cases on age and race. Lifestyle, demographic, and family history data were collected using a standardized questionnaire. MDACC_AS: A prospective cohort study. Source of cases: Men with clinically organ-confined prostate cancer meeting eligibility criteria for a prospective cohort study of active surveillance at MD Anderson Cancer Center. Source of controls: N/A MEC: The Multiethnic Cohort (MEC) is comprised of over 215,000 men and women recruited from Hawaii and the Los Angeles area between 1993 and 1996. Between 1995 and 2006, over 65,000 blood samples were collected from participants for genetic analyses. To identify incident cancer cases, the MEC was cross-linked with the population-based Surveillance, Epidemiology and End Results (SEER) registries in California and Hawaii, and unaffected cohort participants with blood samples were selected as controls MIAMI (WFPCS): Prostate cancer cases and controls were recruited from the Departments of Urology and Internal Medicine of the Wake Forest University School of Medicine using sequential patient populations as described previously (PMID:15342424). All study subjects received a detailed description of the study protocol and signed their informed consent, as approved by the medical center's Institutional Review Board. The general eligibility criteria were (i) able to comprehend informed consent and (ii) without previously diagnosed cancer. The exclusion criteria were (i) clinical diagnosis of autoimmune diseases; (ii) chronic inflammatory conditions; and (iii) infections within the past 6 weeks. Blood samples were collected from all subjects. MOFFITT: Hospital-based. Source of cases: clinic based from Moffitt Cancer Center. Source of controls: Moffitt Cancer Center affiliated Lifetime cancer screening center NMHS: Case-control, clinic based, Nashville TN. Source of cases: All urology clinics in Nashville, TN. Source of controls: Men without prostate cancer at prostate biopsy. PCaP: The North Carolina-Louisiana Prostate Cancer Project (PCaP) is a multidisciplinary population-based case-only study designed to address racial differences in prostate cancer through a comprehensive evaluation of social, individual and tumor level influences on prostate cancer aggressiveness. PCaP enrolled approximately equal numbers of African Americans and Caucasian Americans with newly-diagnosed prostate cancer from North Carolina (42 counties) and Louisiana (30 parishes) identified through state tumor registries. African American PCaP subjects with DNA, who agreed to future use of specimens for research, participated in OncoArray analysis. PCMUS: Case-control - Sofia, Bulgaria. Source of cases: Patients of Clinic of Urology, Alexandrovska University Hospital, Sofia, Bulgaria, PrCa histopathologically confirmed. Source of controls: 72 patients with verified BPH and PSA<3,5; 78 healthy controls from the MMC Biobank, no history of PrCa PHS: Nested case-control. Source of cases: Participants of the PHS1 trial/cohort. Source of controls: Participants of the PHS1 trial/cohort PLCO: Nested case-control. Source of cases: Men with a confirmed diagnosis of prostate cancer from the PLCO Cancer Screening Trial. Source of controls: Controls were men enrolled in the PLCO Cancer Screening Trial without a diagnosis of cancer at the time of case ascertainment. Poland: Case-control. Source of cases: men with unselected prostate cancer, diagnosed in north-western Poland at the University Hospital in Szczecin. Source of controls: cancer-free men from the same population, taken from the healthy adult patients of family doctors in the Szczecin region PROCAP: Population-based, Retrospective, Observational. Source of cases: Cases were ascertained from the National Prostate Cancer Register of Sweden Follow-Up Study, a retrospective nationwide cohort study of patients with localized prostate cancer. Source of controls: Controls were selected among men referred for PSA testing in laboratories in Stockholm County, Sweden, between 2010 and 2012. PROGReSS: Hospital-based, Prospective, Observational. Source of cases: Prostate cancer cases from the Hospital Clínico Universitario de Santiago de Compostela, Galicia, Spain. Source of controls: Cancer-free men from the same population ProMPT: A study to collect samples and data from subjects with and without prostate cancer. Retrospective, Experimental. Source of cases: Subjects attending outpatient clinics in hospitals. Source of controls: Subjects attending outpatient clinics in hospitals ProtecT: Trial of treatment. Samples taken from subjects invited for PSA testing from the community at nine centers across United Kingdom. Source of cases: Subjects who have a proven diagnosis of prostate cancer following testing. Source of controls: Identified through invitation of subjects in the community. PROtEuS: Case-control, population-based. Source of cases: All new histologically-confirmed cases, aged less or equal to 75 years, diagnosed between 2005 and 2009, actively ascertained across Montreal French hospitals. Source of controls: Randomly selected from the Provincial electoral list of French-speaking men between 2005 and 2009, from the same area of residence as cases and frequency-matched on age. QLD: Case-control. Source of cases: A longitudinal cohort study (Prostate Cancer Supportive Care and Patient Outcomes Project: ProsCan) conducted in Queensland, through which men newly diagnosed with prostate cancer from 26 private practices and 10 public hospitals were directly referred to ProsCan at the time of diagnosis by their treating clinician (age range 43-88 years). All cases had histopathologically confirmed prostate cancer, following presentation with an abnormal serum PSA and/or lower urinary tract symptoms. Source of controls: Controls comprised healthy male blood donors with no personal history of prostate cancer, recruited through (i) the Australian Red Cross Blood Services in Brisbane (age range 19-76 years) and (ii) the Australian Electoral Commission (AEC) (age and post-code/ area matched to ProsCan, age range 54-90 years). RAPPER: Multi-centre, hospital based blood sample collection study in patients enrolled in clinical trials with prospective collection of radiotherapy toxicity data. Source of cases: Prostate cancer patients enrolled in radiotherapy trials: CHHiP, RT01, Dose Escalation, RADICALS, Pelvic IMRT, PIVOTAL. Source of controls: N/A SABOR: Prostate Cancer Screening Cohort. Source of cases: Men >45 yrs of age participating in annual PSA screening. Source of controls: Males participating in annual PSA prostate cancer risk evaluations (funded by NCI biomarkers discovery and validation grant), recruited through University of Texas Health Science Center at San Antonio and affiliated sites or through study advertisements, enrolment open to the community SCCS: Case-control in cohort, Southeastern USA. Prospective, Observational, Population-based. Source of cases: SCCS entry population. Source of controls: SCCS entry population SCPCS: Population-based, Retrospective, Observational. Source of cases: South Carolina Central Cancer Registry. Source of controls: Health Care Financing Administration beneficiary file SEARCH: Case-control - East Anglia, UK. Source of cases: Men < 70 years of age registered with prostate cancer at the population-based cancer registry, Eastern Cancer Registration and Information Centre, East Anglia, UK. Source of controls: Men attending general practice in East Anglia with no known prostate cancer diagnosis, frequency matched to cases by age and geographic region SNP_Prostate_Ghent: Hospital-based, Retrospective, Observational. Source of cases: Men treated with IMRT as primary or postoperative treatment for prostate cancer at the Ghent University Hospital between 2000 and 2010. Source of controls: Employees of the University hospital and members of social activity clubs, without a history of any cancer. SPAG: Hospital-based, Retrospective, Observational. Source of cases: Guernsey. Source of controls: Guernsey STHM2: Population-based, Retrospective, Observational. Source of cases: Cases were selected among men referred for PSA testing in laboratories in Stockholm County, Sweden, between 2010 and 2012. Source of controls: Controls were selected among men referred for PSA testing in laboratories in Stockholm County, Sweden, between 2010 and 2012. PCPT: Case-control from a randomized clinical trial. Source of cases: Randomized clinical trial. Source of controls: Randomized clinical trial SELECT: Case-cohort from a randomized clinical trial. Source of cases: Randomized clinical trial. Source of controls: Randomized clinical trial TAMPERE: Case-control - Finland, Retrospective, Observational, Population-based. Source of cases: Identified through linkage to the Finnish Cancer Registry and patient records; and the Finnish arm of the ERSPC study. Source of controls: Cohort participants without a diagnosis of cancer UGANDA: Uganda Prostate Cancer Study: Uganda is a case-control study of prostate cancer in Kampala Uganda that was initiated in 2011. Men with prostate cancer were enrolled from the Urology unit at Mulago Hospital and men without prostate cancer (i.e. controls) were enrolled from other clinics (i.e. surgery) at the hospital. UKGPCS: ICR, UK. Source of cases: Cases identified through clinics at the Royal Marsden hospital and nationwide NCRN hospitals. Source of controls: Ken Muir's control- 2000 ULM: Case-control - Germany. Source of cases: familial cases (n=162): identified through questionnaires for family history by collaborating urologists all over Germany; sporadic cases (n=308): prostatectomy series performed in the Clinic of Urology Ulm between 2012 and 2014. Source of controls: age-matched controls (n=188): age-matched men without prostate cancer and negative family history collected in hospitals of Ulm WUGS/WUPCS: Cases Series, USA. Source of cases: Identified through clinics at Washington University in St. Louis. Source of controls: Men diagnosed and managed with prostate cancer in University based clinic. Acknowledgement Statements: Aarhus: This study was supported by the Danish Strategic Research Council (now Innovation Fund Denmark) and the Danish Cancer Society. The Danish Cancer Biobank (DCB) is acknowledged for biological material. AHS: This work was supported by the Intramural Research Program of the NIH, National Cancer Institute, Division of Cancer Epidemiology and Genetics (Z01CP010119). ATBC: This research was supported in part by the Intramural Research Program of the NIH and the National Cancer Institute. Additionally, this research was supported by U.S. Public Health Service contracts N01-CN-45165, N01-RC-45035, N01-RC-37004, HHSN261201000006C, and HHSN261201500005C from the National Cancer Institute, Department of Health and Human Services. BioVu: The dataset(s) used for the analyses described were obtained from Vanderbilt University Medical Center's BioVU which is supported by institutional funding and by the National Center for Research Resources, Grant UL1 RR024975-01 (which is now at the National Center for Advancing Translational Sciences, Grant 2 UL1 TR000445-06). Canary PASS: PASS was supported by Canary Foundation and the National Cancer Institute's Early Detection Research Network (U01 CA086402) CCI: This work was awarded by Prostate Cancer Canada and is proudly funded by the Movember Foundation - Grant # D2013-36.The CCI group would like to thank David Murray, Razmik Mirzayans, and April Scott for their contribution to this work. CerePP French Prostate Cancer Case-Control Study (ProGene): None reported COH: SLN is partially supported by the Morris and Horowitz Families Endowed Professorship COSM: The Swedish Research Council, the Swedish Cancer Foundation CPCS1 & CPCS2: Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Herlev Ringvej 75, DK-2730 Herlev, DenmarkCPCS1 would like to thank the participants and staff of the Copenhagen General Population Study for their important contributions. CPDR: Uniformed Services University for the Health Sciences HU0001-10-2-0002 (PI: David G. McLeod, MD) CPS-II: The American Cancer Society funds the creation, maintenance, and updating of the Cancer Prevention Study II cohort. CPS-II thanks the participants and Study Management Group for their invaluable contributions to this research. We would also like to acknowledge the contribution to this study from central cancer registries supported through the Centers for Disease Control and Prevention National Program of Cancer Registries, and cancer registries supported by the National Cancer Institute Surveillance Epidemiology and End Results program. EPIC: The coordination of EPIC is financially supported by the European Commission (DG-SANCO) and the International Agency for Research on Cancer. The national cohorts are supported by the Danish Cancer Society (Denmark); the Deutsche Krebshilfe, Deutsches Krebsforschungszentrum and Federal Ministry of Education and Research (Germany); the Hellenic Health Foundation, Greek Ministry of Health; Greek Ministry of Education (Greece); the Italian Association for Research on Cancer (AIRC) and National Research Council (Italy); the Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF); the Statistics Netherlands (The Netherlands); the Health Research Fund (FIS), Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, Spanish Ministry of Health ISCIII RETIC (RD06/0020), Red de Centros RCESP, C03/09 (Spain); the Swedish Cancer Society, Swedish Scientific Council and Regional Government of Skåne and Västerbotten, Fundacion Federico SA (Sweden); the Cancer Research UK, Medical Research Council (United Kingdom). EPICAP: The EPICAP study was supported by grants from Ligue Nationale Contre le Cancer, Ligue départementale du Val de Marne; Fondation de France; Agence Nationale de sécurité sanitaire de l'alimentation, de l'environnement et du travail (ANSES). The EPICAP study group would like to thank all urologists, Antoinette Anger and Hasina Randrianasolo (study monitors), Anne-Laure Astolfi, Coline Bernard, Oriane Noyer, Marie-Hélène De Campo, Sandrine Margaroline, Louise N'Diaye, and Sabine Perrier-Bonnet (Clinical Research nurses). ERSPC: This study was supported by the DutchCancerSociety (KWF94-869,98-1657,2002-277,2006-3518, 2010-4800), The Netherlands Organisation for Health Research and Development (ZonMW-002822820, 22000106, 50-50110-98-311, 62300035), The Dutch Cancer Research Foundation (SWOP), and an unconditional grant from Beckman-Coulter-HybritechInc. ESTHER: The ESTHER study was supported by a grant from the Baden Württemberg Ministry of Science, Research and Arts. The ESTHER group would like to thank Hartwig Ziegler, Sonja Wolf, Volker Hermann, Heiko Müller, Karina Dieffenbach, Katja Butterbach for valuable contributions to the study. FHCRC: The FHCRC studies were supported by grants R01-CA056678, R01-CA082664, and R01-CA092579 from the US National Cancer Institute, National Institutes of Health, with additional support from the Fred Hutchinson Cancer Research Center. FHCRC would like to thank all the men who participated in these studies. Gene-PARE: The Gene-PARE study was supported by grants 1R01CA134444 from the U.S. National Institutes of Health, PC074201 and W81XWH-15-1-0680 from the Prostate Cancer Research Program of the Department of Defense and RSGT-05-200-01-CCE from the American Cancer Society. Hamburg-Zagreb: None reported HPFS: The Health Professionals Follow-up Study was supported by grants UM1CA167552, CA133891, CA141298, and P01CA055075. HPFS are grateful to the participants and staff of the Physicians' Health Study and Health Professionals Follow-Up Study for their valuable contributions, as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, and WY. IMPACT: The IMPACT study was funded by The Ronald and Rita McAulay Foundation, CR-UK Project grant (C5047/A1232), Cancer Australia, AICR Netherlands A10-0227, Cancer Australia and Cancer Council Tasmania, NIHR, EU Framework 6, Cancer Councils of Victoria and South Australia, and Philanthropic donation to Northshore University Health System. We acknowledge support from the National Institute for Health Research (NIHR) to the Biomedical Research Centre at The Institute of Cancer Research and Royal Marsden Foundation NHS Trust. IMPACT acknowledges the IMPACT study steering committee, collaborating centres, and participants. IPO-Porto: The IPO-Porto study was funded by Fundaçäo para a Ciência e a Tecnologia (FCT; UID/DTP/00776/2013 and PTDC/DTP-PIC/1308/2014) and by IPO-Porto Research Center (CI-IPOP-16-2012 and CI-IPOP-24-2015). MC and MPS are research fellows from Liga Portuguesa Contra o Cancro, Núcleo Regional do Norte. SM is a research fellow from FCT (SFRH/BD/71397/2010). IPO-Porto would like to express our gratitude to all patients and families who have participated in this study. Karuprostate: The Karuprostate study was supported by the the Frech National Health Directorate and by the Association pour la Recherche sur les Tumeurs de la ProstateKarusprostate thanks Séverine Ferdinand. KULEUVEN: F.C. and S.J. are holders of grants from FWO Vlaanderen (G.0684.12N and G.0830.13N), the Belgian federal government (National Cancer Plan KPC_29_023), and a Concerted Research Action of the KU Leuven (GOA/15/017). TVDB is holder of a doctoral fellowship of the FWO. LAAPC: This study was funded by grant R01CA84979 (to S.A. Ingles) from the National Cancer Institute, National Institutes of Health. Malaysia: The study was funded by the University Malaya High Impact Research Grant (HIR/MOHE/MED/35). Malaysia thanks all associates in the Urology Unit, University of Malaya, Cancer Research Initiatives Foundation (CARIF) and the Malaysian Men's Health Initiative (MMHI). MCCS: MCCS cohort recruitment was funded by VicHealth and Cancer Council Victoria. The MCCS was further supported by Australian NHMRC grants 209057, 251553, and 504711, and by infrastructure provided by Cancer Council Victoria. Cases and their vital status were ascertained through the Victorian Cancer Registry (VCR) and the Australian Institute of Health and Welfare (AIHW), including the National Death Index and the Australian Cancer Database. MCC-Spain: The study was partially funded by the Accion Transversal del Cancer, approved on the Spanish Ministry Council on the 11th October 2007, by the Instituto de Salud Carlos III-FEDER (PI08/1770, PI09/00773-Cantabria, PI11/01889-FEDER, PI12/00265, PI12/01270, and PI12/00715), by the Fundación Marqués de Valdecilla (API 10/09), by the Spanish Association Against Cancer (AECC) Scientific Foundation and by the Catalan Government DURSI grant 2009SGR1489. Samples: Biological samples were stored at the Parc de Salut MAR Biobank (MARBiobanc; Barcelona) which is supported by Instituto de Salud Carlos III FEDER (RD09/0076/00036). Also sample collection was supported by the Xarxa de Bancs de Tumors de Catalunya sponsored by Pla Director d'Oncologia de Catalunya (XBTC). MCC-Spain acknowledges the contribution from Esther Gracia-Lavedan in preparing the data. We thank all the subjects who participated in the study and all MCC-Spain collaborators. MD Anderson: Prostate Cancer Case-Control Studies at MD Anderson (MDA) supported by grants CA68578, ES007784, DAMD W81XWH-07-1-0645, and CA140388. MDACC_AS: None reported MEC: Funding provided by NIH grant U19CA148537 and grant U01CA164973. MIAMI (WFPCS): ACS MOFFITT: The Moffitt group was supported by the US National Cancer Institute (R01CA128813, PI: J.Y. Park). NMHS: Funding for the Nashville Men's Health Study (NMHS) was provided by the National Institutes of Health Grant numbers: RO1CA121060. PCaP only data: The North Carolina - Louisiana Prostate Cancer Project (PCaP) is carried out as a collaborative study supported by the Department of Defense contract DAMD 17-03-2-0052. For HCaP-NC follow-up data: The Health Care Access and Prostate Cancer Treatment in North Carolina (HCaP-NC) study is carried out as a collaborative study supported by the American Cancer Society award RSGT-08-008-01-CPHPS. For studies using both PCaP and HCaP-NC follow-up data please use: The North Carolina - Louisiana Prostate Cancer Project (PCaP) and the Health Care Access and Prostate Cancer Treatment in North Carolina (HCaP-NC) study are carried out as collaborative studies supported by the Department of Defense contract DAMD 17-03-2-0052 and the American Cancer Society award RSGT-08-008-01-CPHPS, respectively. For any PCaP data, please include: The authors thank the staff, advisory committees and research subjects participating in the PCaP study for their important contributions. For studies using PCaP DNA/genotyping data, please include: We would like to acknowledge the UNC BioSpecimen Facility and LSUHSC Pathology Lab for our DNA extractions, blood processing, storage and sample disbursement (https://genome.unc.edu/bsp). For studies using PCaP tissue, please include: We would like to acknowledge the RPCI Department of Urology Tissue Microarray and Immunoanalysis Core for our tissue processing, storage and sample disbursement. For studies using HCaP-NC follow-up data, please use: The Health Care Access and Prostate Cancer Treatment in North Carolina (HCaP-NC) study is carried out as a collaborative study supported by the American Cancer Society award RSGT-08-008-01-CPHPS. The authors thank the staff, advisory committees and research subjects participating in the HCaP-NC study for their important contributions. For studies that use both PCaP and HCaP-NC, please use: The authors thank the staff, advisory committees and research subjects participating in the PCaP and HCaP-NC studies for their important contributions. PCMUS: The PCMUS study was supported by the Bulgarian National Science Fund, Ministry of Education and Science (contract DOO-119/2009; DUNK01/2-2009; DFNI-B01/28/2012) with additional support from the Science Fund of Medical University - Sofia (contract 51/2009; 8I/2009; 28/2010). PHS: The Physicians' Health Study was supported by grants CA34944, CA40360, CA097193, HL26490, and HL34595. PHS members are grateful to the participants and staff of the Physicians' Health Study and Health Professionals Follow-Up Study for their valuable contributions, as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, and WY. PLCO: This PLCO study was supported by the Intramural Research Program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIHPLCO thanks Drs. Christine Berg and Philip Prorok, Division of Cancer Prevention at the National Cancer Institute, the screening center investigators and staff of the PLCO Cancer Screening Trial for their contributions to the PLCO Cancer Screening Trial. We thank Mr. Thomas Riley, Mr. Craig Williams, Mr. Matthew Moore, and Ms. Shannon Merkle at Information Management Services, Inc., for their management of the data and Ms. Barbara O'Brien and staff at Westat, Inc. for their contributions to the PLCO Cancer Screening Trial. We also thank the PLCO study participants for their contributions to making this study possible. Poland: None reported PROCAP: PROCAP was supported by the Swedish Cancer Foundation (08-708, 09-0677). PROCAP thanks and acknowledges all of the participants in the PROCAP study. We thank Carin Cavalli-Björkman and Ami Rönnberg Karlsson for their dedicated work in the collection of data. Michael Broms is acknowledged for his skilful work with the databases. KI Biobank is acknowledged for handling the samples and for DNA extraction. We acknowledge The NPCR steering group: Pär Stattin (chair), Anders Widmark, Stefan Karlsson, Magnus Törnblom, Jan Adolfsson, Anna Bill-Axelson, Ove Andrén, David Robinson, Bill Pettersson, Jonas Hugosson, Jan-Erik Damber, Ola Bratt, Göran Ahlgren, Lars Egevad, and Roy Ehrnström. PROGReSS: The PROGReSS study is founded by grants from the Spanish Ministry of Health (INT15/00070; INT16/00154; FIS PI10/00164, FIS PI13/02030; FIS PI16/00046); the Spanish Ministry of Economy and Competitiveness (PTA2014-10228-I), and Fondo Europeo de Desarrollo Regional (FEDER 2007-2013). ProMPT: Founded by CRUK, NIHR, MRC, Cambride Biomedical Research Centre ProtecT: Founded by NIHR. ProtecT and ProMPT would like to acknowledge the support of The University of Cambridge, Cancer Research UK. Cancer Research UK grants (C8197/A10123) and (C8197/A10865) supported the genotyping team. We would also like to acknowledge the support of the National Institute for Health Research which funds the Cambridge Bio-medical Research Centre, Cambridge, UK. We would also like to acknowledge the support of the National Cancer Research Prostate Cancer: Mechanisms of Progression and Treatment (PROMPT) collaborative (grant code G0500966/75466) which has funded tissue and urine collections in Cambridge. We are grateful to staff at the Welcome Trust Clinical Research Facility, Addenbrooke's Clinical Research Centre, Cambridge, UK for their help in conducting the ProtecT study. We also acknowledge the support of the NIHR Cambridge Biomedical Research Centre, the DOH HTA (ProtecT grant), and the NCRI/MRC (ProMPT grant) for help with the bio-repository. The UK Department of Health funded the ProtecT study through the NIHR Health Technology Assessment Programme (projects 96/20/06, 96/20/99). The ProtecT trial and its linked ProMPT and CAP (Comparison Arm for ProtecT) studies are supported by Department of Health, England; Cancer Research UK grant number C522/A8649, Medical Research Council of England grant number G0500966, ID 75466, and The NCRI, UK. The epidemiological data for ProtecT were generated though funding from the Southwest National Health Service Research and Development. DNA extraction in ProtecT was supported by USA Dept of Defense award W81XWH-04-1-0280, Yorkshire Cancer Research and Cancer Research UK. The authors would like to acknowledge the contribution of all members of the ProtecT study research group. The views and opinions expressed therein are those of the authors and do not necessarily reflect those of the Department of Health of England. The bio-repository from ProtecT is supported by the NCRI (ProMPT) Prostate Cancer Collaborative and the Cambridge BMRC grant from NIHR. We thank the National Institute for Health Research, Hutchison Whampoa Limited, the Human Research Tissue Bank (Addenbrooke's Hospital), and Cancer Research UK. PROtEuS: PROtEuS was supported financially through grants from the Canadian Cancer Society (13149, 19500, 19864, 19865) and the Cancer Research Society, in partnership with the Ministère de l'enseignement supérieur, de la recherche, de la science et de la technologie du Québec, and the Fonds de la recherche du Québec - Santé.PROtEuS would like to thank its collaborators and research personnel, and the urologists involved in subjects recruitment. We also wish to acknowledge the special contribution made by Ann Hsing and Anand Chokkalingam to the conception of the genetic component of PROtEuS. QLD: The QLD research is supported by The National Health and Medical Research Council (NHMRC) Australia Project Grants (390130, 1009458) and NHMRC Career Development Fellowship and Cancer Australia PdCCRS funding to J Batra. The QLD team would like to acknowledge and sincerely thank the urologists, pathologists, data managers and patient participants who have generously and altruistically supported the QLD cohort. RAPPER: RAPPER is funded by Cancer Research UK (C1094/A11728; C1094/A18504) and Experimental Cancer Medicine Centre funding (C1467/A7286). The RAPPER group thank Rebecca Elliott for project management. SABOR: The SABOR research is supported by NIH/NCI Early Detection Research Network, grant U01 CA0866402-12. Also supported by the Cancer Center Support Grant to the Cancer Therapy and Research Center from the National Cancer Institute (US) P30 CA054174. SCCS: SCCS is funded by NIH grant R01 CA092447, and SCCS sample preparation was conducted at the Epidemiology Biospecimen Core Lab that is supported in part by the Vanderbilt-Ingram Cancer Center (P30 CA68485). Data on SCCS cancer cases used in this publication were provided by the Alabama Statewide Cancer Registry; Kentucky Cancer Registry, Lexington, KY; Tennessee Department of Health, Office of Cancer Surveillance; Florida Cancer Data System; North Carolina Central Cancer Registry, North Carolina Division of Public Health; Georgia Comprehensive Cancer Registry; Louisiana Tumor Registry; Mississippi Cancer Registry; South Carolina Central Cancer Registry; Virginia Department of Health, Virginia Cancer Registry; Arkansas Department of Health, Cancer Registry, 4815 W. Markham, Little Rock, AR 72205. The Arkansas Central Cancer Registry is fully funded by a grant from National Program of Cancer Registries, Centers for Disease Control and Prevention (CDC). Data on SCCS cancer cases from Mississippi were collected by the Mississippi Cancer Registry which participates in the National Program of Cancer Registries (NPCR) of the Centers for Disease Control and Prevention (CDC). The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the CDC or the Mississippi Cancer Registry. SCPCS: SCPCS is funded by CDC grant S1135-19/19, and SCPCS sample preparation was conducted at the Epidemiology Biospecimen Core Lab that is supported in part by the Vanderbilt-Ingram Cancer Center (P30 CA68485). SEARCH: SEARCH is funded by a program grant from Cancer Research UK (C490/A10124) and supported by the UK National Institute for Health Research Biomedical Research Centre at the University of Cambridge. SNP_Prostate_Ghent: The study was supported by the National Cancer Plan, financed by the Federal Office of Health and Social Affairs, Belgium. SPAG: Wessex Medical ResearchHope for Guernsey, MUG, HSSD, MSG, Roger Allsopp STHM2: STHM2 was supported by grants from The Strategic Research Programme on Cancer (StratCan), Karolinska Institutet; the Linné Centre for Breast and Prostate Cancer (CRISP, number 70867901), Karolinska Institutet; The Swedish Research Council (number K2010-70X-20430-04-3) and The Swedish Cancer Society (numbers 11-0287 and 11-0624); Stiftelsen Johanna Hagstrand och Sigfrid Linnérs minne; Swedish Council for Working Life and Social Research (FAS), number 2012-0073STHM2 acknowledges the Karolinska University Laboratory, Aleris Medilab, Unilabs and the Regional Prostate Cancer Registry for performing analyses and help to retrieve data. Carin Cavalli-Björkman and Britt-Marie Hune for their enthusiastic work as research nurses. Astrid Björklund for skilful data management. We wish to thank the BBMRI.se biobank facility at Karolinska Institutet for biobank services. PCPT & SELECT are funded by Public Health Service grants U10CA37429 and 5UM1CA182883 from the National Cancer Institute. SWOG and SELECT thank the site investigators and staff and, most importantly, the participants who donated their time to this trial. TAMPERE: The Tampere (Finland) study was supported by the Academy of Finland (251074), The Finnish Cancer Organisations, Sigrid Juselius Foundation, and the Competitive Research Funding of the Tampere University Hospital (X51003). The PSA screening samples were collected by the Finnish part of ERSPC (European Study of Screening for Prostate Cancer). TAMPERE would like to thank Riina Liikanen, Liisa Maeaettaenen and Kirsi Talala for their work on samples and databases. UGANDA: None reported UKGPCS: UKGPCS would also like to thank the following for funding support: The Institute of Cancer Research and The Everyman Campaign, The Prostate Cancer Research Foundation, Prostate Research Campaign UK (now Prostate Action), The Orchid Cancer Appeal, The National Cancer Research Network UK, The National Cancer Research Institute (NCRI) UK. We are grateful for support of NIHR funding to the NIHR Biomedical Research Centre at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust. UKGPCS should also like to acknowledge the NCRN nurses, data managers, and consultants for their work in the UKGPCS study. UKGPCS would like to thank all urologists and other persons involved in the planning, coordination, and data collection of the study. ULM: The Ulm group received funds from the German Cancer Aid (Deutsche Krebshilfe). WUGS/WUPCS: WUGS would like to thank the following for funding support: The Anthony DeNovi Fund, the Donald C. McGraw Foundation, and the St. Louis Men's Group Against Cancer.
Please note: This synthetic data set (with cohort “participants” / ”subjects” marked with FAKE) has no identifiable data and cannot be used to make any inference about cohort data or results. The purpose of this dataset is to aid development of technical implementations for cohort data discovery, harmonization, access, and federated analysis. In support of FAIRness in data sharing, this dataset is made freely available under the Creative Commons Licence (CC-BY). Please ensure this preamble is included with this dataset and that the CINECA project (funding: EC H2020 grant 825775) is acknowledged. For any questions please contact isuru@ebi.ac.uk or cthomas@ebi.ac.uk This dataset (CINECA_synthetic_cohort_EUROPE_UK1) consists of 2521 samples which have genetic data based on 1000 Genomes data (https://www.nature.com/articles/nature15393), and synthetic subject attributes and phenotypic data derived from UKBiobank (https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001779). These data were initially derived using the TOFU tool (https://github.com/spiros/tofu), which generates randomly generated values based on the UKBiobank data dictionary. Categorical values were randomly generated based on the data dictionary, continuous variables generated based on the distribution of values reported by the UK Biobank showcase, and date / time values were random. Additionally we split the phenotypes and attributes into 4 main classes - general, cancer, diabetes mellitus, and cardiac. We assigned the general attributes to all the samples, and the cardiac / diabetes mellitus / cancer attributes to a proportion of the total samples. Once the initial set of phenotypes and attributes were generated, the data data was checked for consistency and where possible dependent attributes were calculated from the independent variables generated by TOFU. For example, BMI was calculated from height and weight data, and age at death generated by date of death and date of birth. These data were then loaded to the development instance of Biosamples (https://www.ebi.ac.uk/biosamples/) which accessioned each of the samples. The genetic data are derived from the 1000 Genomes Phase 3 release (https://www.internationalgenome.org/category/phase-3/). The genotype data consists of a single joint call vcf files with call genotypes for all 2504 samples, plus bed, bim, fam, and nosex files generated via plink for these samples and genotypes. The genotype data has had a variety of errors introduced to mimic real data and as a test for quality control pipelines. These include gender mismatches, ethnic background mislabelling and low call rates for a randomly chosen subset of sample data as well as deviations from Hardy Weinberg equilibrium and low call rates for a random selection of variants. Additionally 40 samples have raw genetic data available in the form of both bam and cram files, including unmapped data. The gender of the samples in the 1000 genomes data has been matched to the synthetic phenotypic data generated for these samples. The genetic data was then linked to the synthetic data in BioSamples, and submitted to EGA.
The Demographically Diverse Substance Use Disorder Cohorts of Dr. Stanley H. Weiss, which constitute the Epidemiology of the Weiss Cohort Projects, consist of a series of inter-connected projects, building upon a set of cohort projects of various groups, mainly drug users from medication-assisted treatment programs, that Dr. Stanley H. Weiss first developed in the 1980’s plus several newer initiatives, each with an array of collaborators. Beginning in the 1980’s, Dr. Stanley H. Weiss started several long-term studies of persons who inject drugs (PWID) across the United States, ultimately enrolling over 10,000 participants through the early 1990’s with an average age then in their 30’s. About a quarter were enrolled from sites in New Jersey (NJ). These studies included the first testing of PWID for the human immunodeficiency virus (HIV) and the human T-cell lymphotropic viruses (HTLV I and HTLV II). Cumulative past support (initiation thru ~ 1999) for these cohort studies included ~ $20 million from intramural resources from the National Cancer Institute (NCI) and the National Institute on Drug Abuse (NIDA), plus multiple grants and in-kind support from the New Jersey Department of Health (NJDOH) totaling ~ $1 million. The Weiss Cohort Projects include the first large AIDS-era cohorts to include women at high risk for HIV. A high percentage of subjects in these studies are black or Latino. Thus, this is an ethnically diverse US cohort, with a high proportion of women included. These subjects are at high risk of parenteral and sexual infection from both drug use and sexual practices. Samples from other studies conducted by Dr. Weiss, in which detailed interviews were conducted, are included as controls (persons documented by us not to have a history of opioid drug use). As one of our groups of subjects have many persons of Haitian ancestry, we specifically included some Haitians who had never used opioids as controls. Our documentation includes such ancestry. These cohorts demonstrated high rates of HIV and HTLV-II infection in PWIDs, including one study initiated in 1981 with confirmation in the later cohorts. In the first two decades of these studies, among numerous publications was the first study showing a very high rate of hepatitis C infection among PWIDs. An example of how the studies’ long-time horizon proved essential was that it first became possible to test whether a person had ever been infected with hepatitis C virus (HCV), as well as how much HCV was in each person’s blood, many years after the specimens were collected. This allowed HCV amounts in blood to be compared for subjects who had died of liver disease early in the study versus those who survived. Then a sequence of published papers culminated in demonstrating, using a nested case-control design, that a high baseline HCV titer was predictive of early progression to death from end-stage liver failure. Outcomes related to HCV (end stage liver disease and hepatocellular carcinoma) remain under study. In the original cohort studies, the mean age at enrollment was ~ 33 years old, so that those still alive in 2022 are mainly now ~ 60 - 75 years old. Many participants have already died. The tincture of time has led to subjects reaching ages when many more are dying from a wide array of outcomes, including from many chronic diseases (including cancer) as well as from infectious agents (especially HIV, HCV) or drug overdose. Renewed collaboration with local drug treatment programs has led to new field-based studies, including examination of some currently evolving problems among drug users. Dr. Weiss joined the National Institute on Drug Abuse (NIDA) Genetics Consortium (NGC) in 2017, and through the NIDA project officer has had access to NGC contract resources (see below). NIH Certificate of Confidentiality, CC-DA-16-214 (attached) protects these studies. Past arrangements related to data on our subjects leads to restrictions on the use of data emanating from our study, such as potential commercialization and restrictions on whom may access and use these data. NIDA Genetics Consortium (NGC) resources further support these endeavors and will be used as part of the NGC analyses studying the genetics of substance use. Study participants signed informed consent for the information collected from them to be used with no time limit and for biologic specimens collected from them to be used without restriction in future research. Serum samples were collected from participants, and from many also plasma, white blood cells and/or urine samples. About 100,000 vials were stored. All specimens have been continuously preserved at sufficiently cold temperatures to prevent deterioration, and many subjects separated white blood cells were processed and frozen in such a way as to maintain viability. Detailed data from the participants has been accumulated over time, and in general, linkage has been retained in each sub-study in accordance with the consent forms and protocols. For some participants, specimens were collected at multiple times (that is, sequential specimens). Multiple specimens from a single person exist in this database, and efforts at de-duplication remain ongoing. Dr. Weiss should be contacted if an investigator requires unique individuals since: • Multiple phases of enrollment occurred, and as our prospective follow-up continues; Dr. Weiss may identify new instances of multiple enrollment. • Some persons are related to each other. • In general, in this dataset for dbGaP, only a single specimen/record form a given person is included. Advances in laboratory testing techniques now permit innovative new uses for our linked research biospecimen repository. The ongoing focus of an interdisciplinary research program based on these cohorts relates subjects’ diseases, behaviors, medical history, and outcomes with biological and exposure markers. Participants’ use of various substances was ascertained on study enrollments, many serially over time. Quantitative frequency of use data, also sometimes sequential over time, were ascertained. Active ascertainment of outcomes is being conducted, including matching to mortality and cancer databases. Investigators interested in collaborations on specific outcomes (which is not part of this dbGaP dataset) or in the use of our stored specimens are encouraged to contact the principal investigator, Dr. Weiss. The processing of the genomic data was done in conjunction with NIDA, and in accordance with some longstanding data cleaning steps used by NIDA in the NIDA Genetics Consortium (NGC), a group to which we shall be contributing these data for collaborative analyses. Since there is the potential for these steps to introduce certain types of potential biases, we summarize these here. Under contract from NIDA, cryopreserved sera or plasma (-80 C) or cells (in liquid nitrogen) were used, with most stored having been stored for 30 to 40 years in our biorepository. In the case of serum or plasma, in which only (largely) cell-free DNA fragments were available, DNA was extracted and restored prior to amplification. Industry standard DNA amplification techniques were done on all samples prior to genotyping in accord with established protocols of the NIDA Genetics Consortium. Our genotype data were run and processed on the Illumina Infinium OmniExpress_v_1.3 array. This array has 714,238 SNPs, and was designed many years ago. There were 628 SNPs on the array that do not correspond to any chromosome position, and these were removed. Genotype data were submitted by NIDA’s contracted genotyping laboratory in six batches over time to NIDA’s contracted dbGaP data management group, which conducted quality control (QC) analyses. QC analysis included an assessment of batch effects on for five of the six batches. (One of the batches, with only 12 samples, was too small for QC analysis of batch effects.) Standard NIDA Genetic Consortium cleaning was performed. Samples with a call rate <.85 were removed. Only one sample per person was retained. When more than one specimen was genotyped from one subject, only the sample with the higher call rate was retained (provided, of course, that that call rate was ≥ 0.85). We have retained some people we know are related, including some found to have been related through genotyping; the pedigree file describes those relationships. In summary, key cleaning steps include: 1. Using PLINK to check gender discrepancy. 2. Using PREST-PLUS and KING (Kinship-based Inference for GWAS) to check relatedness. 3. Using PEDCHECK and PLINK to check/zero-out Mendelian error. 4. Using PLINK to perform sample QC, SNP QC, along with KING to perform chromosome X and chromosome Y QC. 5. SNP-QC: Batch-effect: 5 Batches were compared (one batch, with few samples, was not). These five batches were compared to each other in all ten possible pairs, one batch vs. another batch, examining SNP allele frequency discrepancies by population (from GRAF), Fisher Exact Allelic test, with the criterion of p<5e-8 for removal. 6. SNP-QC: discordant SNPs in QC duplicates. Compared 25 QC duplicated samples with call rate > 0.95, removed SNPs with 3+ discordance. 7. There were 1,056 SNPs that were monomorphic; these have been retained so they can be included in analyses in which our dbGaP data are combined with those from other cohorts (in the latter of which those SNPs may not be monomorphic). The final cleaned dataset submitted has 8,898 samples and 606,793 SNPs.
Dataset contains mRNA-sequencing data (fastq files) from 2 patients with NUP214::ABL1 disease (one T-ALL and one B-ALL patient). Patient samples (peripheral blood or bone marrow) were taken at diagnosis (B-ALL) or relapse (T-ALL). mRNA was extracted from lymphoblasts and underwent paired-end sequencing (75bp) using the Illumina NextSeq platform. Transcriptomic data was used for identification of gene fusions, SNVs and InDels. mRNA-seq data for an additional B-ALL patient with NUP214::ABL1 disease utilised in this study has already been published (EGAS00001006460).
Background and Rationale for the Childhood Cancer Survivor Study (CCSS) Over the last several decades, advances in treatments for childhood and adolescent cancer have substantially improved survival following diagnosis. These improvements gave rise to the responsibility for investigating long-term treatment-associated morbidity and mortality. Early efforts to describe late effects were largely conducted through single-institution and limited consortia studies. However, by the mid-1980s, it became increasingly clear that these approaches had inherent limitations, including small sample size, convenience sampling, incompletely characterized populations, and limited length of follow-up. To overcome these limitations, the CCSS was proposed and funded by the National Cancer Institute (NCI) as a U01 grant in 1994. Subsequently, the strengths of the CCSS, including an efficient and extensive infrastructure, plus expanding database and biorepository, were recognized and appreciated. Thus, in consultation with the NCI, the CCSS was converted to a U24 (resource grant) funding mechanism to serve the scientific community in 2000. The overarching goal of the CCSS resource is to increase the conduct of innovative and high impact research related to pediatric cancer survivorship. CCSS has been used extensively by researchers from a wide range of disciplines to address a broad spectrum of topics. Strengths of the resource include its large size, comprehensive annotation of treatment exposures, ongoing longitudinal follow-up with characterization of a wide array of participant characteristics and outcomes, and an established biorepository. Design of the Childhood Cancer Survivor Study The Childhood Cancer Survivor Study (CCSS) is a multi-institutional, multi-disciplinary collaborative research resource comprised of a retrospective hospital-based cohort of survivors of childhood cancer and a comparison sibling cohort. Eligible survivors from 31 participating institutions were diagnosed between 1970 and 1999, prior to age 21 years, with selected common pediatric cancers (leukemia, central nervous system tumors, Hodgkin lymphoma, non-Hodgkin lymphoma, kidney tumors, neuroblastoma, soft tissue sarcoma, or bone tumors). All patients who survived five years from the date of diagnosis were eligible, regardless of disease or treatment status. The baseline questionnaire was completed by 24,368 survivors and 5,039 siblings recruited to serve as a comparison group. To date, participants have completed three general follow-up surveys, as well as a number of specialized surveys on specific topics (e.g. health care, insurance, screening practices, men's and women's health issues, adolescent health, sleep and fatigue). In addition, biological samples (buccal cells, saliva and/or blood) have been collected for over 11,000 participants. Full descriptions of the design and characteristics of the CCSS have been previously published (Robison et al; Leisenring et al.), and available data and samples are described at https://ccss.stjude.org/develop-a-study/gwas-data-resource.html. Treatment Data in the Childhood Cancer Survivor Study A key feature of CCSS is the availability of detailed treatment data, which were collected by abstraction of medical records for each individual member of the cohort. Detailed abstraction included dates of therapy, protocol information, and specific details regarding surgery, chemotherapy and radiation. Quantitative dose details were collected for 22 specific chemotherapeutic agents, including alkylating agents, anthracyclines, platinum compounds and epipodophyllotoxins. In addition to individual agent doses, algorithms have been created to calculate cumulative doses of all drugs in a specific class, such as anthracyclines (doxorubicin, daunomycin and idarubicin) or platinum agents (cisplatinum and carboplatinum). Data abstracted for surgeries included dates and both the names and corresponding International Classification of Diseases (9th revision) code. For radiation treatment data, all relevant records were sent to the Radiation Physics Center at M.D. Anderson Cancer Center for detailed abstraction and dosimetry. Initial body region dosimetry was performed for all participants, followed by more detailed dosimetry as needed for specific studies. Genomics Data in the Childhood Cancer Survivor StudyThe NCI's Division of Cancer Epidemiology and Genetics and CCSS investigators collaborated to conduct genomics studies (SNP array genotyping and whole exome sequencing) using samples from the CCSS Biorepository. Studies included all cohort participants with available DNA regardless of sex or ancestry when the genomics studies were initiated. Phenotype Data in the Childhood Cancer Survivor Study Vital status and cause of death for both participants and non-participants is determined via linkage with the National Death Index (NDI). Identification of subsequent neoplasms is based on self-report, followed by validation using medical records, or via NDI. A wide array of additional health outcomes have been ascertained via a comprehensive set of questions on the CCSS questionnaires, covering potential adverse events across a range of organ systems (hearing/vision/speech, urinary, hormonal, heart and circulatory, respiratory, digestive, brain and nervous systems). In addition to health outcomes, longitudinal data have been collected on demographics, health behaviors, family history, screening practices, insurance status, and a range of psychosocial and neurocognitive factors. A full listing of available variables and copies of the CCSS questionnaires are available at http://ccss.stjude.org. Research Areas in the Childhood Cancer Survivor Study Extensive use by the research community has resulted in over 265 published manuscripts on a wide range of topics, including associations between treatment factors and mortality, subsequent neoplasms, chronic health conditions, cardiac events, neurocognitive sequelae, psychosocial factors, fertility, and health status. Additional topics have included health behaviors, screening practices, health care access and utilization, statistical and exposure assessment methodology, and development of risk prediction models. A full listing of published manuscripts using CCSS data is available on the CCSS website at https://ccss.stjude.org/published-research/publications.html. The Childhood Cancer Survivor Study as a Resource for Investigators The CCSS is an NCI-funded resource (U24 CA55727) to promote and facilitate research among long-term survivors of cancer diagnosed during childhood and adolescence. Interested investigators are encouraged to develop research ideas and propose projects within CCSS, whether or not they are from a participating CCSS institution. The CCSS is now accepting proposals to collaborate with CCSS and NCI investigators in the use of genomics data and corresponding outcomes-related data to address innovative research questions relating to potential genetic contributions to risk for treatment-related outcomes. Any researcher, or group of researchers, qualified to conduct genetic research can submit a proposal. There are no restrictions relative to country, institution, or prior involvement in CCSS. A full description of the process for developing a proposal for genetic research in CCSS can be found at https://ccss.stjude.org/develop-a-study/gwas-data-resource.html, along with listings of approved proposals.
Available DataThe available data include all elements of the previously released SPRINT Primary Outcome Paper (SPRINT-POP) data, the full SPRINT clinical data including the MRI and MIND data, and select ancillary study data (Ambulatory Blood Pressure Monitoring, APOL1, Acute Kidney Injury, ASK, Heart, FAST, PWV, Renal Resistance, Biomarkers, Plasma AD).ObjectiveThe Systolic Blood Pressure Trial (SPRINT) was conducted to test the hypothesis that treating systolic blood pressure to a target of less than 120 mm Hg, as compared to a target of less than 140 mm Hg, would reduce the incidence of cardiovascular disease.BackgroundHypertension is a highly prevalent condition among adults and is a leading risk factor for myocardial infarction and stroke. Further, isolated systolic hypertension is the most common form of hypertension in adults over 50 years of age. Observational studies have shown a monotonic increase in cardiovascular risk with systolic blood pressures above 115 mm Hg; however, general population clinical trials have only documented the benefits of lowering systolic blood pressure to a target of 150 mm Hg. A 2007 expert panel sponsored by the National Heart, Lung, and Blood Institute designated the hypothesis that lowering the systolic blood pressure goal to a level SubjectsA total of 9361 participants were enrolled, with 4,678 randomized to the intensive-treatment group and 4,683 randomized to the standard-treatment group.DesignSPRINT was a randomized, single blinded (outcome adjudicators were blinded to treatment assignment) treatment trial with participants randomized to a systolic blood-pressure target of either less than 140 mm Hg (the standard-treatment group) or less than 120 mm Hg (the intensive-treatment group). Following randomization, baseline hypertensive regimens were adjusted in accordance with study treatment algorithms established for each group. The study formulary included all major classes of antihypertensive agents. Investigators could prescribe other antihypertensive medications, but the use of drug classes with the strongest evidence for reduction in cardiovascular outcomes was encouraged. This included thiazide-type diuretics as the first-line agent, loop diuretics for participants with advanced chronic kidney disease, and beta-adrenergic blockers for participants with coronary artery disease. Medications for participants in the intensive-treatment group were adjusted on a monthly basis to target a systolic blood pressure of less than 120 mm Hg. Medications for participants in the standard-treatment group were adjusted to target a systolic blood pressure of 135 to 139 mm Hg, and the dose was reduced if systolic blood pressure was less than 130 mm Hg on a single visit or less than 135 mm Hg on two consecutive visits. Lifestyle modification was encouraged as part of the management strategy.Participants were seen monthly for the first 3 months and every 3 months thereafter. Demographic data were collected at baseline. Clinical and laboratory data were obtained at baseline and every 3 months thereafter. A structured interview was used in both groups every 3 months to obtain self-reported cardiovascular disease outcomes. Medical records and electrocardiograms were obtained for documentation of events. Incidences of hypotension, syncope, injurious falls, electrolyte abnormalities, and bradycardia that were evaluated in an emergency department were included in adverse event reporting. Occurrences of acute kidney injury or acute renal failure requiring hospitalization were also monitored. The primary outcome was a composite outcome of myocardial infarction, other acute coronary syndromes, stroke, heart failure, or death from cardiovascular causes.ConclusionsThe blood pressure intervention was stopped in August of 2015 (median follow-up of 3.26 years) after the cardiovascular outcome results exceeded the boundary for efficacy at two consecutive time points. Compared with a systolic blood pressure target of less than 140 mm Hg, an intensive systolic blood pressure target of 120 mm Hg resulted in lower rates of fatal and nonfatal major cardiovascular events and death from any cause. Significantly higher rates of some adverse events were observed in the intensive-treatment group.