Need Help?

MethylBERT enables read-level DNA methylation pattern identification and tumour deconvolution using a Transformer-based model

DNA methylation (DNAm) is a key epigenetic mark that shows profound alterations in cancer. Read-level methylomes enable more in-depth DNAm analysis due to the broad coverage and preservation of rare cell-type signals, compared to array-based data such as 450K/EPIC array. Here, we propose MethylBERT, a novel Transformer-based model for read-level methylation pattern classification. MethylBERT identifies tumour-derived sequence reads based on their methylation patterns and genomic sequence and the method estimates tumour cell fractions within bulk samples. In our evaluation, MethylBERT outperforms existing deconvolution methods and demonstrates high accuracy regardless of methylation pattern complexity, read length and read coverage. Moreover, we show its applicability to cell-type deconvolution as well as non-invasive early cancer diagnostics using liquid biopsy samples. MethylBERT represents a significant advancement in read-level methylome analysis and enables accurate tumour purity estimation. The broad applicability of MethylBERT will enhance studies on both tumour and non-cancerous bulk methylomes.

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
EGAD50000001183 HiSeq X Ten 15
Publications Citations
MethylBERT enables read-level DNA methylation pattern identification and tumour deconvolution using a Transformer-based model.
Nat Commun 16: 2025 788
0