The study protocol was approved by the Committee for the Protection of Human Subjects at the University of Texas Health Science Center at Houston (HSC-MS-11-0185). Study recruitment began on July 1, 2017, and concluded on March 30, 2022. After written informed consent, we enrolled probands with early onset BAV disease (EBAV), which we defined as individuals with BAV who were under the age of 30 at the time of first clinical event. Clinical events were defined as aortic replacement, aortic valve surgery, aortic dissection, moderate or severe aortic stenosis or aortic regurgitation, large aneurysm (Z > 4.5), or intervention for BAV-related conditions. Those with hypoplastic left heart, known genetic mutations, genetic syndromes, or complex congenital heart disease were excluded. Samples were collected and genotyped as previously reported. For comparison, we analyzed a cohort of older individuals of European ancestry with sporadic BAV disease selected from the International BAV Consortium (BAVGWAS). Phenotypes were derived from record review with confirmation of image data whenever possible [25-26]. The computational pipeline for CNV analysis of Illumina single nucleotide polymorphism (SNP) array data included three independent CNV detection algorithms.GenomeStudio was used to exclude samples with indeterminate sex or more than 5% missing genotypes, and single nucleotide polymorphisms (SNPs) with GenTrain = 0. Principal component analysis was used to remove outliers that did not cluster with European ancestry. Prior to CNV analysis, each dataset was trimmed by selecting a common set of 650,000 SNPs that were genotyped on each of the microarrays used in this study.Three independent algorithms (PennCNV, cnvPartition, and QuantiSNP) were used to generate CNV calls and sample-level quality statistics from SNP intensity data. PennCNV and QuantiSNP were run on Unix clusters and cnvPartition data were exported from GenomeStudio. The analysis was run using default configurations. PennCNV was used to generate QC data and remove CNV calls that intersect with polymorphic genomic regions. Samples that met any of the following criteria were excluded, standard deviation of the LogR ratio (obtained from PennCNV) > 0.35 or number of CNVs > 2 standard deviations above the mean for each data set. CNV calls less than 20 Kilobases in length and/or spanned by fewer than 6 probes were excluded. The overlap function for rare CNVs in PLINK was used to construct CNV regions (CNVRs) after adjacent regions were merged using PennCNV. LogR ratio (LRR) and B allele frequency (BAF) data at CNVRs and calls of interest were visualized in GenomeStudio for validation. For segregation analysis, GenomeStudio was used to determine the presence of CNVs in relatives.A total of 22,014 unselected control Illumina Genotypes obtained from the Database of Genotypes and Phenotypes were analyzed using identical methods (S1 Table). The Wisconsin Longitudinal Study (WLS) includes data on a cohort of 10,300 individuals who graduated from Wisconsin high schools in 1957. The Health and Retirement Study (HRS) includes data on 37,000 individuals aged 50 above from 23,000 households across the United States. Principal component analysis was used to select European ancestry genotypes from these datasets for analysis. Datasets were paired for case-control analysis based on the concordance of log-transformed sample-level quality control statistics (number of CNV calls and standard deviation of logR ratios). Chi-squared or Fisher exact tests were used to compare CNV frequencies in cases and controls.Rare CNV functions in PLINK (v1.7) were used to perform permutation-based burden tests or gene set-based enrichment tests. Case control burden tests were restricted to CNVs that were longer than 110 Kb and less than 0.1% in frequency. CNV overlap functions in PLINK were used to identify rare CNVs that intersect between datasets or involve specific BAV or CHD genes . The list of candidate genes included 190 CHD genes that have strong cumulative evidence to cause BAV or related congenital malformations from human or animal model data. Genome Reference Consortium Human Build 37 was used for CNV annotation [34].
The IPM BioMe Biobank, founded in September 2007, is an ongoing, broadly-consented electronic health record (EHR)-linked clinical care biobank that enrolls participants non-selectively from the Mount Sinai Medical Center patient population. BioMe currently comprises >42,000 participants from diverse ancestries, characterized by a broad spectrum of longitudinal biomedical traits. Participants are enrolled through an opt-in process and consent to be followed throughout their clinical care (past, present, and future) in real-time, allowing us to integrate their genomic information with their EHRs for discovery research and clinical care implementation. BioMe participants consent for recall, based on their genotype and/or phenotype, permitting in-depth follow-up and functional studies for selected participants at any time. Phenotypic and genomic data are stored in a secure database and made available to investigators, contingent on approval by the BioMe Governing Board. BioMe uses a "data-broker" system to protect confidentiality. Ancestral diversity - BioMe participants represent a broad racial, ethnic and socioeconomic diversity with a distinct and population-specific disease burden. Specifically, BioMe participants are of African (AA), Hispanic/Latino (HL), European (EA) and other/mixed ancestry. BioMe participants are predominantly of African (AA, 24%), Hispanic/Latino (HL, 35%), European (EA, 32%), and other ancestry (OA, 10%). Participants who self-identify as Hispanic/Latino further report to be of Puerto Rican (39%), Dominican (23%), Central/South American (17%), Mexican (5%) or other Hispanic (16%) ancestry. More than 40% of European ancestry participants are genetically determined to be of Ashkenazi Jewish ancestry. With this broad ancestral diversity, BioMe is uniquely positioned to examine the impact of demographic and evolutionary forces that have shaped common disease risk. Phenotypes available in BioMe - BioMe has a high-quality and validated set of fully implemented clinical phenotype data that has been culled by a multi-disciplinary team of experienced investigators, clinicians, information technologists, data-managers, and programmers who apply advanced medical informatics and data mining tools to extract and harmonize EHRs. BioMe, as a cohort, offers a great versatility for designing nested case-control sample-sets, particularly for studying longitudinal traits and co-morbidity in disease burden. Biomedical and clinical outcomes: The BioMe Biobank is linked to Mount Sinai's system-wide Epic EHR, which captures a full spectrum of biomedical phenotypes, including clinical outcomes, covariate and exposure data from past, present and future health care encounters. As such, the BioMe Biobank has a longitudinal design as participants consent to make all of their EHR data from past (dating back as far as 2003), present and future inpatient or outpatient encounters available for research, without restriction. The median number of outpatient encounters is 21 per participant, reflecting predominant enrollment of participants with common chronic conditions from primary care facilities. Environmental data: The clinical and EHR information is complemented by detailed demographic and lifestyle information, including ancestry, residence history, country of origin, personal and familial medical history, education, socio-economic status, physical activity, smoking, dietary habits, alcohol intake, and body weight history, which is collected in a systematic manner by interview-based questionnaire at time of enrollment. The IPM BioMe Biobank contributed ~10,600 DNA samples for whole genome sequencing to the TOPMed program. Samples were selected for the Coronary Artery Disease (CAD) and the Chronic Obstructive Pulmonary Disease (COPD) working groups. Using a Case-Definition-Algorithm (CDA), we identified ~4,100 individuals with CAD (~50% women) and ~3,000 individuals as controls (65% women). In addition, we identified ~800 individuals with COPD (62% women) and 1800 individuals as controls (72% women). Another 600 BioMe participants with Atrial Fibrillation, all of African ancestry, were included.
In 2016 we established the Sporadic ALS Australia Systems Genomics Consortium (SALSA-SGC) funded by the Ice Bucket Challenge Grant administered by the Motor Neurone Disease Research Institute of Australia. The goals of the SALSA-SGC are to collect biological samples from clinics across Australia with matched in depth clinical and self-report phenotypes and to generate multiple levels of genetic and genomic data. In this first data generation exercise of the SALSA-SGC the majority of the samples were collected prior to the formal establishment of SALSA-SGC from clinics across Australia.Briefly, the cohort includes the University of Sydney’s Australian Motor Neuron Disease DNA Bank (MND Bank) cohort recruited April 2000 to June 2011), with study protocol approved by the Sydney South West Area Health Service Human Research Ethics Committee (HREC). Cases were recruited from around Australia via state-based MND associations with diagnosis verified by a neurologist. The remainder of the cases were recruited from clinics across Australia between 2015 and 2017 under HREC approvals from Royal Brisbane and Women’s Hospital, Macquarie University Multidisciplinary Motor Neurone Disease Clinic, Calvary Health Care Bethlehem in Melbourne , Fiona Stanley Hospital in Perth, and from 2016 under HREC approvals at each site for the sporadic ALS Australia Systems Genomics Consortium (SALSA-SGC). The ALS cases were diagnosed with definite or probable ALS according to the revised El Escorial criteria. Some controls were recruited as either partners or friends of patients, healthy individuals free of neuromuscular diseases. We are providing GWAS and MWAS data in this dataset. Individual level GWAS data were generated using Illumina Infinium CoreExome-24 version 1.1 chips for N= 846 cases and N=665 controls. Individual MWAS data was generated using the Illumina Human methylation 450K array for N=782 cases and N=613 controls. There 1315 individuals where GWAS and MWAS data has been generated and is available. Further information on these data sets can be found: Paper 1: Restuadi, R, Garton, FC, Benyamin, B, Lin, T, et al. Amyotrophic Lateral Sclerosis Genetic Correlation with Cognitive Performance, educational attainment and schizophrenia: evidence from polygenic risk score analysis. (submitted) Paper 2: Nabais, MF, Lin, T, Benyamin, B et al. Significant out-of-sample classification from methylation profile scoring for amyotrophic lateral sclerosis. 2020. NPJ genomic medicine. 5(10). Files provided in this submission include: GWAS: This folder contains QCed genotype for the Australian ALS case-control cohort. Contains PLINK files for genotyping data (not imputed yet). The individuals selected here have: good consistency on phenotype data ethics approval registered as part of sporadic ALS studies unrelated by GRM cut-off 0.05 No ancestry QC yet MWAS: This folder contains the IDAT and post-QC normalized DNAm (beta) for the Australian ALS case-control cohort. 2019_AUS_ALS_PCTG_DNAm.tar.gz - IDATS for 1315 individuals analyzed in the MWAS study normalized_beta_values - Binary files (created with the OSCA software) containing information on the individuals, probes and the DNAm (beta) values obtained after QC phenotype_file - contains all the covariates analysed in the MWAS including: case-control status, coded 0 = Control and 1 = ALS, predicted age, predicted cell-type proportions, predicted smoking scores, slide and chip position and sex Important Notes: The DNAm data were normalized together with samples that were not part of this ALS case/control study and thus, the normalization procedure may not be 100% reproducible using only the IDAT files uploaded here. Summary data has been made publicly available and can be accessed directly: Data collection and sample processing were performed at several clinics across Australia. Genotyping and DNA methylation arrays were performed by the Human Studies Unit, at the Institute for Molecular Bioscience (University of Queensland). Quality control of the genotypic, phenotypic and DNA methylation data was done by the Program of Complex Traits Genomics, at the Institute for Molecular Bioscience (University of Queensland).