We included 3 BAM files of the genome sequencing data: 2 of 3 are from tumour samples, namely 1 repaired-FFPE and 1 unrepaired FFPE; the third BAM file is from normal tissue of FFPE block. There is also a VCF file containing all somatic mutations in the dataset.
Noncoding RNAs (ncRNAs) are emerging as key molecules in human cancer, with the potential to serve as novel markers of disease and to reveal uncharacterized aspects of tumor biology. Here we discover 121 unannotated prostate cancer-associated ncRNA transcripts (PCATs) by ab initio assembly of high-throughput sequencing of polyA+ RNA (RNA-Seq) from a cohort of 102 prostate tissues and cells lines. We characterized one ncRNA, PCAT-1, as a prostate-specific regulator of cell proliferation and show that it is a target of the polycomb repressive complex 2 (PRC2). We further found that patterns of PCAT-1 and PRC2 expression stratified patient tissues into molecular subtypes distinguished by expression signatures of PCAT-1-repressed target genes. Taken together, our findings suggest that PCAT-1 is a transcriptional repressor implicated in a subset of prostate cancer patients. These findings establish the utility of RNA-Seq to identify disease-associated ncRNAs that may improve the stratification of cancer subtypes.
Data Access NOTE: Please refer to the “Authorized Access” section below for information about how access to the data from this accession differs from many other dbGaP accessions. Objectives: To characterize the clinical and laboratory course of patients with severe alpha 1-antitrypsin deficiency whether or not the patient is undergoing long-term augmentation therapy.Background: A hereditary disorder, patients with low serum levels of alpha-1-antitrypsin are at an increased risk for the early onset of emphysema. The only approved treatment for alpha-1-antitrypsin deficiency is augmentation therapy using a purified preparation of human alpha-1-antitryspin. Sample sizes for a randomized controlled clinical trial of augmentation therapy were determined to be infeasible; therefore, a multi-center registry was initiated in 1988 to explore the natural history of the disease and the relative efficacy of augmentation therapy in patients with a severe deficiency of alpha-1-antitrypsin.Participants: Eligible participants included individuals 18 years of age or greater for whom the Central Laboratory confirmed that the serum alpha 1-antitrypsin level is Conclusions: Participants receiving augmentation therapy had decreased mortality risk during follow-up. FEV1 decline among all participants did not differ by augmentation therapy; however, among participants with FEV1 35-49% predicted, FEV1 decline was significantly slower for participants on augmentation therapy than for those not receiving therapy. (Am J Respir Crit Care Med, 1998; 158:49-59)
Celiac disease (gluten-sensitive enteropathy, celiac sprue) is a common disease with significant morbidity and mortality. It is caused by sensitivity to the dietary protein gluten, resulting in a chronic enteropathy in the small intestine. Celiac disease is now recognized to be a common disease, with reports that the disease frequency is 1:133 in the United States, similar to European estimates. There is recent evidence to suggest that the incidence of the disease is rising. Occult disease is frequently present with minimal classic symptoms or signs. The ratio of symptomatic to asymptomatic celiac disease is estimated to be 1:7. Some complications of celiac disease include lymphoma, osteoporosis, anemia, miscarriages, seizures, vitamin deficiencies, and co-occurrence of other autoimmune diseases. The only treatment is a gluten-free diet, so that recurrence of symptoms and complications may occur after minor dietary indiscretions. Identifying the underlying genetic causes of celiac disease may allow us to identify susceptible individuals, as well as advance new ways to prevent the disease and to treat it once it occurs. Several genome-wide linkage studies and one genome-wide association study of celiac disease have been conducted. Other than the common locus at HLA, few putative celiac disease loci have been replicated between studies, suggesting that the disease is heterogeneous and complex. A portion of the genetic predisposition can be attributed to HLA, but the etiology is likely due to a number of rare, high penetrant genes (as would be identified in linkage analysis) and to more common, low penetrant genes (as would be identified in association studies). The objective of this study is to identify new loci that increase risk to develop celiac disease. We will capitalize on existing North American resources, including three large collections of celiac cases and controls (Aim 1) and an independent set of celiac cases and their families (Aim 3). We propose a comprehensive multi-stage approach, with the following specific aims. In Aim 1, we will conduct a genome-wide association (GWA) study of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs). The GWA will include1900 disease cases and 3400 matched controls, genotyped using the Illumina Human 610-quad chip. Statistical analyses will be performed to test for associations with celiac disease. In Aim 2, we will conduct a combined analysis of our GWA dataset from Aim 1 and a previous GWA dataset to identify additional low-penetrance loci. In Aim 3, we will attempt to replicate significant findings from the GWA study in two independent sample sets. At the conclusion of this study, we expect to have validated a number of SNPs and CNPs from regions in the genome that alter the risk of developing celiac disease. These SNPs may prove useful at both the clinical and research level. The data from the study will be shared with the scientific community.
Study of the well-established DLD-1 colorectal cancer cell line, engineered with an inactivatable Y centromere that leads to mis-segregation. The goal is to gain insights into the mechanism of chromothripsis due to chromosome mis-segration.
Efficacy, safety and biomarker analysis of ICARUS-BREAST01: a phase 2 Study of Patritumab Deruxtecan in patients with HR+/HER2- advanced breast cancer. RNA-squencing is available for 44 samples comprising 22 samples collected at entry into the trial and 22 samples collected during treatment (cycle 1 day 3, cycle 1 day 19, or cycle 2 day 3) from 22 patients.
The study protocol was approved by the Committee for the Protection of Human Subjects at the University of Texas Health Science Center at Houston (HSC-MS-11-0185). Study recruitment began on July 1, 2017, and concluded on March 30, 2022. After written informed consent, we enrolled probands with early onset BAV disease (EBAV), which we defined as individuals with BAV who were under the age of 30 at the time of first clinical event. Clinical events were defined as aortic replacement, aortic valve surgery, aortic dissection, moderate or severe aortic stenosis or aortic regurgitation, large aneurysm (Z > 4.5), or intervention for BAV-related conditions. Those with hypoplastic left heart, known genetic mutations, genetic syndromes, or complex congenital heart disease were excluded. Samples were collected and genotyped as previously reported. For comparison, we analyzed a cohort of older individuals of European ancestry with sporadic BAV disease selected from the International BAV Consortium (BAVGWAS). Phenotypes were derived from record review with confirmation of image data whenever possible [25-26]. The computational pipeline for CNV analysis of Illumina single nucleotide polymorphism (SNP) array data included three independent CNV detection algorithms.GenomeStudio was used to exclude samples with indeterminate sex or more than 5% missing genotypes, and single nucleotide polymorphisms (SNPs) with GenTrain = 0. Principal component analysis was used to remove outliers that did not cluster with European ancestry. Prior to CNV analysis, each dataset was trimmed by selecting a common set of 650,000 SNPs that were genotyped on each of the microarrays used in this study.Three independent algorithms (PennCNV, cnvPartition, and QuantiSNP) were used to generate CNV calls and sample-level quality statistics from SNP intensity data. PennCNV and QuantiSNP were run on Unix clusters and cnvPartition data were exported from GenomeStudio. The analysis was run using default configurations. PennCNV was used to generate QC data and remove CNV calls that intersect with polymorphic genomic regions. Samples that met any of the following criteria were excluded, standard deviation of the LogR ratio (obtained from PennCNV) > 0.35 or number of CNVs > 2 standard deviations above the mean for each data set. CNV calls less than 20 Kilobases in length and/or spanned by fewer than 6 probes were excluded. The overlap function for rare CNVs in PLINK was used to construct CNV regions (CNVRs) after adjacent regions were merged using PennCNV. LogR ratio (LRR) and B allele frequency (BAF) data at CNVRs and calls of interest were visualized in GenomeStudio for validation. For segregation analysis, GenomeStudio was used to determine the presence of CNVs in relatives.A total of 22,014 unselected control Illumina Genotypes obtained from the Database of Genotypes and Phenotypes were analyzed using identical methods (S1 Table). The Wisconsin Longitudinal Study (WLS) includes data on a cohort of 10,300 individuals who graduated from Wisconsin high schools in 1957. The Health and Retirement Study (HRS) includes data on 37,000 individuals aged 50 above from 23,000 households across the United States. Principal component analysis was used to select European ancestry genotypes from these datasets for analysis. Datasets were paired for case-control analysis based on the concordance of log-transformed sample-level quality control statistics (number of CNV calls and standard deviation of logR ratios). Chi-squared or Fisher exact tests were used to compare CNV frequencies in cases and controls.Rare CNV functions in PLINK (v1.7) were used to perform permutation-based burden tests or gene set-based enrichment tests. Case control burden tests were restricted to CNVs that were longer than 110 Kb and less than 0.1% in frequency. CNV overlap functions in PLINK were used to identify rare CNVs that intersect between datasets or involve specific BAV or CHD genes . The list of candidate genes included 190 CHD genes that have strong cumulative evidence to cause BAV or related congenital malformations from human or animal model data. Genome Reference Consortium Human Build 37 was used for CNV annotation [34].
The ELLIPSE Consortium is an international effort to discover risk loci for prostate cancer. It includes the meta-analysis of existing GWAS data as well as novel GWAS, exome, and iCOGS genotyping. The GWAS meta-analysis includes the following cases and controls from studies of European ancestry: UK GWAS stage 1 (Illumina Infinium HumanHap 550 Array: 1854 cases and 1894 controls), UK GWAS stage 2 (Illumina iSELECT: 3706 cases and 3884 controls), CAPS1 (Affymetrix GeneChip 500K: 474 cases and 482 controls), CAPS2 (Affymetrix GeneChip 5.0K: 1458 cases and 512 controls), BPC3 (Illumina Human610 Illumina: 2068 cases and 3011 controls), PEGASUS (HumanOmni2.5: 4600 cases and 2941 controls). The OMNI 2.5M genotyping was conducted for 977 prostate cancer cases from UKGPCS. The Exome SNP array genotyping was conducted for 4741 subjects from UKGPCS. The iCOGs genotyping was conducted for 10366 subjects which includes the Multiethnic Cohort (n=1648) and UKGPCS (n=8718). Below is a description of each study that contributed to the meta-analysis of men of European ancestry. Information about the studies that contributed to the multiethnic meta-analysis can be found on the associated study page and also in Conti et al (Nature Genetics, PMID:33398198). UK GWAS Stage 1 (UK1) and Stage 2 (UK2): The UK Genetic Prostate Cancer Study (UKGPCS) was first established in 1993 and is the largest prostate cancer study of its kind in the UK, involving nearly 189 hospitals. We are based at The Institute of Cancer Research in Sutton, Surrey, and collaborate with the Royal Marsden NHS Foundation Trust. Our aim is to find genetic changes which are associated with prostate cancer risk. Our target is to recruit 26,000 gentlemen into the UKGPCS by 2017. Men are eligible to take part if they fit into at least one of the following groups: They have been diagnosed with prostate cancer at 60 years of age or under (up to their 61st birthday). They have been diagnosed with prostate cancer and a first, second or third degree relative where at least one of these men were diagnosed with prostate cancer at 65 years of age or under. They are affected and have 3 or more cases of prostate cancer on one side of their family. They are a prostate cancer patient at the Royal Marsden NHS Foundation Trust. We have to date recruited around 16,000 men on whom we have germline DNA and clinical data at diagnosis. The UK GWAS is based on genotyping of 541,129 SNPs in 1,854 individuals with clinically detected (non-PSA-screened) prostate cancer (cases) and 1,894 controls. 43,671 SNPs showing strong evidence of association in stage 1 were followed up by genotyping a further 3,268 cases and 3,366 controls from UK and Melbourne in stage2. CAPS1 and CAPS2: The CAPS (Cancer of the Prostate in Sweden) study represents a large Swedish population-based cancer study, comprising 3,161 cases and 2,149 controls, recruited between 2001 and 2003. Biopsy confirmed prostate cancer cases were identified and recruited from four out of six regional cancer registries in Sweden, diagnosed between July 2001 and October 2003. Clinical data including TNM stage, Gleason grade and PSA levels at time for diagnosis were retrieved through record linkage to the National Prostate Cancer Registry. Control subjects, who were recruited concurrently with case subjects, were randomly selected from the Swedish Population Registry and matched according to the expected age distribution of cases (groups of 5-year intervals) and geographic region. Whole blood was collected from all individuals for extraction of genomic DNA. A GWAS was conducted in two parts. In the first phase (CAPS1) 498 cases and 502 controls were genotyped, in the second phase 1,483 cases and 519 controls were genotyped. Genotyping was performed using the GeneChip Human Mapping 500K (CAPS1) and 5.0K (CAPS2) Array Set from Affymetrix (Santa Clara, CA). The National Cancer Institute Breast and Prostate Cancer Cohort Consortium, BPC3: BPC3 was a consortium of prospective cohort studies investigating genetic and gene-environmental risk factors for breast and prostate cancer. Each study selected cases and controls for this study as described below. The clinical criteria defining advanced prostate cancer (Gleason = 8 or stage C/D) were either obtained from medical records or cancer registries. The Gleason score source was either surgical specimens (radical prostatectomy or autopsy) or the diagnostic biopsy (needle biopsy or TURP). When multiple Gleason scores were available the surgical value was used. PLCO was removed from the analysis as the samples were included in the Pegasus GWAS described below. In total 2,473 advanced prostate cancer cases and 3,534 controls were included in the analysis following QC. ATBC, Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study: ATBC was a randomized, placebo-controlled primary prevention trial to investigate whether α-tocopherol or ß-carotene supplementation reduced the incidence of lung or other cancers in male smokers. Between 1985 and 1988, 29,133 men ages 50 to 69 years were enrolled in the trial from Finland and randomized to supplementation (50 mg α-tocopherol, 20mg ß-carotene, or both) or placebo. Men with a prior history of cancer, other than non-melanoma skin cancer or carcinoma in situ, were excluded from participating. Incident cancer cases are identified through linkage with the Finnish Cancer Registry, which has ~100% ascertainment of cancer cases nationwide. Cases included 249 men diagnosed with advanced prostate cancer (Gleason = 8 or stage C/D) from 1985 to 2003 with DNA available. Controls were 1,271 men selected previously for a GWAS of lung cancer in ATBC without a diagnosis of prostate cancer. CPSII, Cancer Prevention Study II: CPSII is a cohort study started in 1982 to investigate the relationship between dietary, lifestyle and other etiologic factors and cancer mortality. Approximately 1.2 million men and women enrolled in the study from 50 states in the U.S. In 1992, a subset of these participants (n= ~184,000) were enrolled in the CPSII Nutrition Cohort to examine the relationship between dietary and other exposures and cancer incidence. Blood samples were drawn from approximately 39,376 members of the Nutritional Cohort from 1998 to 2001, and buccal cells were collected from 69,467 members from 2001 to 2002. Cancer cases are identified by self-report through follow-up questionnaires followed by verification through medical records and/or linkage to state cancer registries as well as death certificates. A total of 660 advanced prostate cancer cases (Gleason = 8 or stage III/IV) with a source of DNA were identified for this study. Controls were 660 men matched on ethnicity, date of birth, sample collection date and DNA type. EPIC, European Prospective Investigation into Cancer and Nutrition: EPIC is a prospective study designed to investigate both genetic and non-genetic risk factors for different forms of cancer. Study participants were almost all white Europeans. Approximately 500,000 individuals (150,000 men) in EPIC were recruited between 1992 and 2000, from 23 centers in 10 European countries. Overall approximately 400,000 subjects also provided a blood sample at recruitment. The methods of recruitment and details of the study design are described in detail elsewhere. In brief, study participants completed an extensive questionnaire on both dietary and nondietary data at recruitment. The present study includes subjects from advanced prostate cancer cases (Gleason = 8 or stage III/IV) matched to controls based on study center, length of follow-up, age at enrollment (± 6 months), fasting and time of day of blood collection (± 1 hour). The advanced prostate cancer subjects were from 8 of the 10 participating countries: Denmark, Germany, Greece, Italy, the Netherlands, Spain, Sweden and the United Kingdom (UK). France and Norway were not included in the current study because these cohorts only included female subjects. All participants gave written consent for the research and approval for the study was obtained from the ethical review board from all local institutions in the regions where participants had been recruited for the EPIC study. HPFS, Health Professionals Follow-up Study: HPFS began in 1986 and is an ongoing prospective cohort study of 51,529 United States male dentists, optometrists, osteopaths, podiatrists, pharmacists, and veterinarians 40 to 75 years of age. The baseline questionnaire provided information on age, marital status, height and weight, ancestry, medications, smoking history, disease history, physical activity, and diet. At baseline the cohort was 97% white, 2% Asian American, and 1% African American. The median follow-up through 2005 was 10.5 years (range 2-19 years). Self-reported prostate cancer diagnoses were confirmed by obtaining medical and/or pathology records. Prostate cancer deaths are either reported by family members in response to follow-up questionnaires, discovered by the postal system, or the National Death Index. Questionnaires are sent every two years to surviving men to update exposure and medical history. In 1993 and 1994, a blood specimen was collected from 18,018 men without a prior diagnosis of cancer. Prostate cancer cases are matched to controls on birth year (+/-1) and ethnicity. Controls are selected from those who are cancer-free at the time of the case’s diagnosis, and had a prostate-specific antigen test after the date of blood draw. MEC, Multiethnic Cohort: The Multiethnic Cohort Study is a population-based prospective cohort study that was initiated between 1993 and 1996 and includes subjects from various ethnic groups - African Americans and Latinos primarily from Californian (great Los Angeles area) and Native Hawaiians, Japanese-Americans, and European Americans primarily from Hawaii. State drivers’ license files were the primary sources used to identify study subjects in Hawaii and California. Additionally, in Hawaii, state voter’s registration files were used, and, in California, Health Care Financing Administration (HCFA) files were used to identify additional African American men. All participants (n=215,251) returned a 26-page self-administered baseline questionnaire that obtained general demographic, medical and risk factor information. In the cohort, incident cancer cases are identified annually through cohort linkage to population-based cancer Surveillance, Epidemiology, and End Results (SEER) registries in Hawaii and Los Angeles County as well as to the California State cancer registry. Information on stage and grade of disease are also obtained through the SEER registries. Blood sample collection in the MEC began in 1994 and targeted incident prostate cancer cases and a random sample of study participants to serve as controls for genetic analyses. PHS, Physicians Health Study:PHS was a randomized trial of aspirin and ß carotene for cardiovascular disease and cancer among 22,071 U.S. male physicians ages 40-84 years at randomization; none had a cancer diagnosis at baseline. The original trial ended, but the men are followed. From 1982 to 1984, blood samples were collected from 14,916 physicians before randomization. Participants are sent yearly questionnaires to ascertain endpoints. Whenever a physician reports cancer, we request permission to obtain the medical records, and cancers are confirmed by pathology report. We obtain death certificates and pertinent medical records for all deaths. Follow-up for nonfatal outcomes in PHS is over 97% complete, and for mortality, over 99%. PLCO, Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial:PLCO is a multicenter, randomized trial to evaluate screening methods for the early detection of prostate, lung, colorectal and ovarian cancer. Between 1993 and 2001, over 150,000 men and women ages 55-74 years were recruited from ten centers in the United States (Birmingham, AL; Denver, CO; Detroit, MI; Honolulu, HI; Marshfield, WI; Minneapolis, MN; Pittsburgh, PA; Salt Lake City, UT; St. Louis, MO; and Washington, D.C.). Men randomized to the screening arm underwent prostate cancer screening with prostate-specific antigen (PSA) annually for six years and digital rectal exam annually for four years. Blood specimens were collected from participants randomized to the screening arm of the trial, and buccal cell specimens were obtained from participants randomized to the control arm. Cases included 754 men diagnosed with advanced prostate cancer (Gleason = 8 or stage III/IV) from either arm of the trial. Of these cases, 317 were genotyped previously as part of Cancer Genetic Markers of Susceptibility (CGEMS), a GWAS for prostate cancer. Controls included 1,491 men without a diagnosis of prostate cancer from the screening arm of the PLCO trial. All subjects provided informed consent to participate in genetic etiology studies of cancer and other traits. This study was approved by the institutional review boards at the ten centers and the National Cancer Institute. PLCO was removed from the meta-analysis of the BPC3 studies as a consequence of PEGASUS below. PEGASUS, Prostate cancer Genome-wide Association Study of Uncommon Susceptibility loci: Pegasus is a genome-wide association nested within the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. PLCO is a multicenter, randomized trial to evaluate screening methods for the early detection of prostate, lung, colorectal and ovarian cancer. Between 1993 and 2001, over 150,000 men and women ages 55-74 years were recruited from ten centers in the United States (Birmingham, AL; Denver, CO; Detroit, MI; Honolulu, HI; Marshfield, WI; Minneapolis, MN; Pittsburgh, PA; Salt Lake City, UT; St. Louis, MO; and Washington, D.C.). Men randomized to the screening arm underwent prostate cancer screening with prostate-specific antigen annually for six years and digital rectal exam annually for four years. Blood specimens were collected from participants randomized to the screening arm of the trial, and buccal cell specimens were obtained from participants randomized to the control arm. Cases included 4,598 men of European ancestry diagnosed with prostate cancer from either arm of the trial and controls included 2,941 men of European ancestry without a diagnosis of cancer from the screening arm, matched on age and year of randomization. All subjects provided informed consent, and the study approved by the institutional review board at the National Cancer Institute. Funding:This work was supported by the GAME-ON U19 initiative for prostate cancer (ELLIPSE): U19 CA148537. The BPC3 was supported by the U.S. National Institutes of Health, National Cancer Institute (cooperative agreements U01-CA98233, U01-CA98710, U01-CA98216, and U01-CA98758, and Intramural Research Program of NIH/National Cancer Institute, Division of Cancer Epidemiology and Genetics). The ATBC study and PEGASUS was supported in part by the Intramural Research Program of the NIH and the National Cancer Institute. Additionally, this research was supported by U.S. Public Health Service contracts N01-CN-45165, N01-RC-45035, N01-RC-37004 and HHSN261201000006C from the National Cancer Institute, Department of Health and Human Services. CAPS: The Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden was supported by the Cancer Risk Prediction Center (CRisP; www.crispcenter.org), a Linneus Centre (Contract ID 70867902) financed by the Swedish Research Council, Swedish Research Council (grant: K2010-70X-20430-04-3), the Swedish Cancer Foundation (grant: 09-0677), the Hedlund Foundation, the Söderberg Foundation, the Enqvist Foundation, ALF funds from the Stockholm County Council. Stiftelsen Johanna Hagstrand och Sigfrid Linnér’s Minne, Karlsson’s Fund for urological and surgical research. We thank and acknowledge all of the participants in the Stockholm-1 study. We thank Carin Cavalli-Björkman and Ami Rönnberg Karlsson for their dedicated work in the collection of data. Michael Broms is acknowledged for his skillful work with the databases. KI Biobank is acknowledged for handling the samples and for DNA extraction. Hans Wallinder at Aleris Medilab and Sven Gustafsson at Karolinska University Laboratory are thanked for their good cooperation in providing historical laboratory results. UKGPCS would like to acknowledge the NCRN nurses and Consultants for their work in the UKGPCS study. We thank all the patients who took part in this study. This work was supported by Cancer Research UK (grants: C5047/A7357, C1287/A10118, C1287/A5260, C5047/A3354, C5047/A10692, C16913/A6135 and C16913/A6835). We would also like to thank the following for funding support: Prostate Research Campaign UK (now Prostate Cancer UK), The Institute of Cancer Research and The Everyman Campaign, The National Cancer Research Network UK, The National Cancer Research Institute (NCRI) UK. We are grateful for support of NIHR funding to the NIHR Biomedical Research Centre at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust. The MEC was supported by NIH grants CA63464, CA54281 and CA098758.
The genome-wide association study (GWAS) includes participants enrolled into two different studies. The first study, the San Francisco Bay Area Cancer Study (SFBCS) is a population-based case-control study of breast cancer conducted in the San Francisco Bay Area and included women ages 35-79 years from three racial/ethnic groups: Non-Hispanic whites, African Americans, and Hispanics/Latinas. For the GWAS, only Hispanic/Latina women were included. Women diagnosed with invasive breast cancer between 1995 and 2002 were identified through the Greater Bay Area Cancer Registry. Controls were identified by random digit dialing and were frequency-matched to cases by age in 5 year increments and by race/ethnicity. Hispanic/Latina ethnicity was assessed by self-report. 175 Hispanic/Latina cases and 307 Hispanic/Latina controls from the SFBCS had given adequate consent and provided biospecimens that were used in the GWAS to be included in this data submission. The second study is the Northern California site of the Breast Cancer Family Registry (NC-BCFR). This population-based family study recruited breast cancer cases ages 18-64 years diagnosed from 1995-2009 that were identified through the Greater Bay Area Cancer Registry. Cases included all women at increased genetic susceptibility for breast cancer who met one or more of the following criteria: (a) being diagnosed with breast cancer at age <35 years; b) having a personal history of ovarian cancer or childhood cancer; (c) being diagnosed with two different breast cancers (bilateral breast cancers), with the first one diagnosed at age <50 years; and d) having one or more first-degree relatives with breast cancer, ovarian cancer or childhood cancer. Cases not meeting these criteria were randomly sampled and racial/ethnic minorities were oversampled. Controls were recruited by random digit dialing and were matched by 5-year age increments and by race/ethnicity. For the current GWAS only Latina/Hispanic cases and controls were included. Latina/Hispanic ethnicity was assessed by self-report. 631 Hispanic/Latina cases and 61 Hispanic/Latina controls from the NC-BCFR had given adequate consent and provided biospecimens that were used in the GWAS to be included in this data submission.
Congenital diaphragmatic hernia (CDH) is a common and severe birth defect characterized by structural defects of the diaphragm and by pulmonary hypoplasia. Congenital diaphragmatic hernia patient may present either as an isolated phenotype or together with other congenital anomalies in a complex phenotype. Despite the clinical significance of CDH, the underlying genetic and developmental pathways are incompletely understood. In order to establish a catalog of human genetic variation for this condition, we performed whole exome sequencing (WES) on 275 carefully phenotyped individuals with CDH in the Pediatric Surgical Research Laboratories at the Massachusetts General Hospital (Boston, MA, USA) and Boston Children's Hospital (Boston, MA, USA). The exome data generated are valuable for comparison of candidate genes derived from WES of other CDH cohorts or affected kindreds, and to provide ideal candidates for further functional studies with the ultimate goal of enhancing our understanding of the heterogeneous and, possibly, oligogenic molecular etiology of CDH. While familial clustering has been reported in rare kindreds, the majority of probands with CDH have no family history of CDH leading to the hypothesis that de novo variants are an important and relatively frequent etiological mechanism. In the second version of the study, we performed WES analysis on 87 trios, to assess the contribution of de novo mutations in the etiology of diaphragmatic and pulmonary defects and to identify previously unknown candidate genes. This dbGaP submission includes WES data on: (a) 20 new probands, (b) 9 probands also reported in the previous version (dbGaP accession no. phs000783.v1.p1), (c) 174 unaffected parents, including the parents of 67 previously reported probands. Combined analysis with other available cohorts of congenital diaphragmatic hernia revealed a genome-wide enrichment of likely gene-disrupting de novo variants (i.e., nonsense, frameshift or splice site), and missense de novo variants predicted in silico to be damaging.