Allele Balance Bias Identifies Systematic Genotyping Errors and False Disease Associations

Study ID Alternative Stable ID Type
EGAS00001003027 Other

Study Description

In recent years, Next Generation Sequencing (NGS) has become a cornerstone of clinical genetics and diagnostics. Many clinical applications require high precision, especially if rare events such as somatic mutations in cancer or genetic variants causing rare diseases need to be identified. Although random sequencing errors can be modeled statistically and deep sequencing minimizes their impact, systematic errors remain a problem even at high depth of coverage. Understanding their source is crucial to increase precision of clinical NGS applications. In this work, we studied the relation between recurrent biases in allele balance (AB), systematic errors and false positive variant calls across a large cohort of human samples analyzed by whole exome sequencing (WES). We have modeled the allele balance distribution for biallelic genotypes in 987 WES samples in order to identify positions recurrently deviating significantly from the expectation, a phenomenon we termed allele balance bias (ABB). Furthermore, we have developed a genotype callability score based on ABB for all positions of ... (Show More)

Study Datasets 1 dataset.

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
This dataset has two Variants Files in VCF format used in ABB project ( One has the variants found in a Rare Variant Association Study performed in CLL patients. This has 1217 samples represented. The other variant file has 209 SNPs predicted in 10 samples by GATK HaplotypeCaller and selected for Sanger Sequencing Validation. Raw reads were aligned against the Human Reference genome (Hg19) with BWA mem and variants were obtained using GATK ... (Show More)

Who archives the data?