Sequencing data from MethylScan assay on plasma
Machine learning models in biomedical research are often hindered by demographic imbalances in clinical datasets, leading to biased predictions that disadvantage minority populations. Existing bias correction methods face limitations in handling the heterogeneity of biomedical data and the complexity of demographic influences. Here, we present DeBias, a computational framework for mitigating demographic biases in high-dimensional biomedical datasets. DeBias eliminates bias-associated subspaces within the entire feature space using control samples and enables global bias correction across all features, effectively preserving disease-specific signals while minimizing demographic distortions. To demonstrate its effectiveness, we applied DeBias to cfDNA methylation data for cancer detection. It achieved a >6-fold reduction in the number of features containing biases and outperformed existing methods in improving the cancer detection performance for minority populations. Our method represented a promising step toward developing machine learning models that are more accurate, reliable, and equitable for minority populations.
- Type: Other
- Archiver: European Genome-Phenome Archive (EGA)
Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data
| Dataset ID | Description | Technology | Samples |
|---|---|---|---|
| EGAD00001015815 | Illumina NovaSeq X | 1 |
