Need Help?

Measuring the level of relatedness between NGS datasets

Sequencing technologies are providing increasingly detailed insight in genetic makeup are paving their way into molecular diagnostics. The field will benefit from rigorous and bias-free measures for the quality of sequence data and for the proper representation of the complexity of the original samples. While current methodologies rely on the availability of a well-characterized reference genome, we propose kMer profiling for alignment-free assessment of the quality, comparability, and complexity of sequencing datasets. We show that kMer detects technical artefacts such as high duplication rates, library chimaeras, and differences in library preparation protocols in whole-genome, whole-exome, and RNA sequencing data. Additionally, it successfully captures the complexity and diversity of microbiomes. Thus, kMer allows for a robust evaluation of the quality and complexity of sequencing data without relying on any prior information and opens the way to a more reliable biological reasoning.

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
EGAD00001000759 Illumina HiSeq 2000 86