Pipeline Olympics: Continuous benchmarking of computational workflows for DNA methylation sequencing data
DNA methylation is a widely studied epigenetic mark and a powerful biomarker of cell type, age, environmental exposure, and disease. Genome-wide DNA methylation profiling is usually performed using whole-genome sequencing following selective conversion of unmethylated cytosines into thymines via bisulfite treatment or enzymatic methods. Numerous software tools facilitate processing of raw sequencing data for DNA methylation profiling, but a comprehensive benchmarking has been lacking. In this study, we systematically compared methods and tools for processing DNA methylation sequencing data. We established a dedicated benchmarking dataset comprising genome-wide DNA methylation profiling data for four reference samples using five experimental protocols each. The results of data processing were compared to highly quantitative locus-specific DNA methylation measurements available for 46 genomic loci. Based on this gold-standard reference dataset, we identified workflows that consistently demonstrated high performance. We present our results in an interactive application that provides a “living benchmark” ready for continuous updating as new workflows are proposed and require benchmarking against existing methods. In summary, our study provides guidance for any laboratory studying DNA methylation on a genome-wide scale.
- Type: Cancer Genomics
- Archiver: European Genome-Phenome Archive (EGA)
Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data
| Dataset ID | Description | Technology | Samples |
|---|---|---|---|
| EGAD50000000772 | Illumina HiSeq 2000 Illumina HiSeq X | 20 |
| Publications | Citations |
|---|---|
|
Pipeline Olympics: continuable benchmarking of computational workflows for DNA methylation sequencing data against an experimental gold standard.
Nucleic Acids Res 53: 2025 gkaf970 |
0 |
