Need Help?

Pipeline Olympics: Continuous benchmarking of computational workflows for DNA methylation sequencing data

DNA methylation is a widely studied epigenetic mark and a powerful biomarker of cell type, age, environmental exposure, and disease. Genome-wide DNA methylation profiling is usually performed using whole-genome sequencing following selective conversion of unmethylated cytosines into thymines via bisulfite treatment or enzymatic methods. Numerous software tools facilitate processing of raw sequencing data for DNA methylation profiling, but a comprehensive benchmarking has been lacking. In this study, we systematically compared methods and tools for processing DNA methylation sequencing data. We established a dedicated benchmarking dataset comprising genome-wide DNA methylation profiling data for four reference samples using five experimental protocols each. The results of data processing were compared to highly quantitative locus-specific DNA methylation measurements available for 46 genomic loci. Based on this gold-standard reference dataset, we identified workflows that consistently demonstrated high performance. We present our results in an interactive application that provides a “living benchmark” ready for continuous updating as new workflows are proposed and require benchmarking against existing methods. In summary, our study provides guidance for any laboratory studying DNA methylation on a genome-wide scale.

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
EGAD50000000772 Illumina HiSeq 2000 Illumina HiSeq X 20
Publications Citations
Pipeline Olympics: continuable benchmarking of computational workflows for DNA methylation sequencing data against an experimental gold standard.
Nucleic Acids Res 53: 2025 gkaf970
0