Study

Measuring the level of relatedness between NGS datasets

Study ID Alternative Stable ID Type
EGAS00001000600 Other

Study Description

Sequencing technologies are providing increasingly detailed insight in genetic makeup are paving their way into molecular diagnostics. The field will benefit from rigorous and bias-free measures for the quality of sequence data and for the proper representation of the complexity of the original samples. While current methodologies rely on the availability of a well-characterized reference genome, we propose kMer profiling for alignment-free assessment of the quality, comparability, and complexity of sequencing datasets. We show that kMer detects technical artefacts such as high duplication rates, library chimaeras, and differences in library preparation protocols in whole-genome, whole-exome, and RNA sequencing data. Additionally, it successfully captures the complexity and diversity of microbiomes. Thus, kMer allows for a robust evaluation of the quality and complexity of sequencing data without relying on any prior information and opens the way to a more reliable biological reasoning.

Study Datasets 1 dataset.

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
EGAD00001000759
NA
Illumina HiSeq 2000 86

Who archives the data?

There are no publications available