Need Help?

Genetic Basis of Early Onset Bicuspid Aortic Valve Disease

The study protocol was approved by the Committee for the Protection of Human Subjects at the University of Texas Health Science Center at Houston (HSC-MS-11-0185). Study recruitment began on July 1, 2017, and concluded on March 30, 2022. After written informed consent, we enrolled probands with early onset BAV disease (EBAV), which we defined as individuals with BAV who were under the age of 30 at the time of first clinical event. Clinical events were defined as aortic replacement, aortic valve surgery, aortic dissection, moderate or severe aortic stenosis or aortic regurgitation, large aneurysm (Z > 4.5), or intervention for BAV-related conditions. Those with hypoplastic left heart, known genetic mutations, genetic syndromes, or complex congenital heart disease were excluded. Samples were collected and genotyped as previously reported. For comparison, we analyzed a cohort of older individuals of European ancestry with sporadic BAV disease selected from the International BAV Consortium (BAVGWAS). Phenotypes were derived from record review with confirmation of image data whenever possible [25-26]. The computational pipeline for CNV analysis of Illumina single nucleotide polymorphism (SNP) array data included three independent CNV detection algorithms.


GenomeStudio was used to exclude samples with indeterminate sex or more than 5% missing genotypes, and single nucleotide polymorphisms (SNPs) with GenTrain = 0. Principal component analysis was used to remove outliers that did not cluster with European ancestry. Prior to CNV analysis, each dataset was trimmed by selecting a common set of 650,000 SNPs that were genotyped on each of the microarrays used in this study.


Three independent algorithms (PennCNV, cnvPartition, and QuantiSNP) were used to generate CNV calls and sample-level quality statistics from SNP intensity data. PennCNV and QuantiSNP were run on Unix clusters and cnvPartition data were exported from GenomeStudio. The analysis was run using default configurations. PennCNV was used to generate QC data and remove CNV calls that intersect with polymorphic genomic regions. Samples that met any of the following criteria were excluded, standard deviation of the LogR ratio (obtained from PennCNV) > 0.35 or number of CNVs > 2 standard deviations above the mean for each data set. CNV calls less than 20 Kilobases in length and/or spanned by fewer than 6 probes were excluded. The overlap function for rare CNVs in PLINK was used to construct CNV regions (CNVRs) after adjacent regions were merged using PennCNV. LogR ratio (LRR) and B allele frequency (BAF) data at CNVRs and calls of interest were visualized in GenomeStudio for validation. For segregation analysis, GenomeStudio was used to determine the presence of CNVs in relatives.


A total of 22,014 unselected control Illumina Genotypes obtained from the Database of Genotypes and Phenotypes were analyzed using identical methods (S1 Table). The Wisconsin Longitudinal Study (WLS) includes data on a cohort of 10,300 individuals who graduated from Wisconsin high schools in 1957. The Health and Retirement Study (HRS) includes data on 37,000 individuals aged 50 above from 23,000 households across the United States. Principal component analysis was used to select European ancestry genotypes from these datasets for analysis. Datasets were paired for case-control analysis based on the concordance of log-transformed sample-level quality control statistics (number of CNV calls and standard deviation of logR ratios). Chi-squared or Fisher exact tests were used to compare CNV frequencies in cases and controls.

Rare CNV functions in PLINK (v1.7) were used to perform permutation-based burden tests or gene set-based enrichment tests. Case control burden tests were restricted to CNVs that were longer than 110 Kb and less than 0.1% in frequency. CNV overlap functions in PLINK were used to identify rare CNVs that intersect between datasets or involve specific BAV or CHD genes . The list of candidate genes included 190 CHD genes that have strong cumulative evidence to cause BAV or related congenital malformations from human or animal model data. Genome Reference Consortium Human Build 37 was used for CNV annotation [34].