Resolving the Full Spectrum of Human Genome Variation using Linked-Reads

Study ID Alternative Stable ID Type
EGAS00001003121 Other

Study Description

Large-scale population based analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. Standard short-read approaches, used primarily due to accuracy, throughput and costs, fail to give a complete picture of a genome. They struggle to identify large, balanced structural events, cannot access repetitive regions of the genome and fail to resolve the human genome into its two haplotypes. Here we describe an approach that retains long range information while harnessing the power of short reads. Starting from only 1ng of DNA, we produce barcoded short read libraries. The use of novel informatic approaches allows for the barcoded short reads to be associated with the long molecules of origin producing a novel datatype known as 'Linked-Reads'. This approach allows for simultaneous detection of small and large variants from a single Linked-Read library. We have previously demonstrated the utility of whole genome Linked-Reads (lrWGS) for performing diploid, de novo assembly of individual genomes. In this manuscript, we show ... (Show More)

Study Datasets 1 dataset.

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
This dataset contains Linked-Read Whole Exome Sequencing (lrWES) from individuals with known disease-causing variants. The dataset comprises of 30 samples from 10 donors, where multiple samples from the same donor reflect experimental differences assaying the effect of input DNA length on coverage and phasing. Raw data (i.e. BAM files) and variant analysis (i.e. VCF files) for each sample are included in this dataset.
Illumina HiSeq 4000 30

Who archives the data?