Comparison of structural variations from 10X Genomics linked-reads and conventional Illumina short-reads sequencing

Structural variations (SVs) are large genomic rearrangements that can drive many diseases. Conventional short-reads whole genome sequencing (cWGS) allows their identification with base-pair resolution, but suffers from high false discovery rate. cWGS taps in short-range information from short-reads while linked-reads sequencing (10XWGS) utilizes long-range information. 10XWGS allows linkage of short-reads originating from the same large DNA molecule with a unique barcode captured in a gel bead in emulsion. This mitigates alignment-based artefacts from cWGS especially in repetitive regions. However, the false discovery rate of this technology is unclear. In this study, we performed a comprehensive analysis of different type and size of SVs predicted from these two technologies. The SVs common between both technologies were found to be highly specific by PCR and Sanger sequencing while validation rate dropped for uncommon events. Further, we propose a novel enrichment approach for filtering out false positive calls from both the technologies independently. To this end, we trained a ... (Show More)

This dataset contains whole genome sequencing data from Illumina short-reads sequencing (2X150bp) and 10X Genomics linked-reads sequencing. Both the sequencing technologies were used to sequence MCF7 cell line and a primary breast triple-negative cancer sample. The fastq of paired-end reads for both the samples sequenced with both the technologies is available.
