Need Help?

Bulk-RNA Sequencing of high-grade pancreatic and non-pancreatic Neuroendocrine Neoplasms

Therapeutic decisions in oncology depend on a precise pathological classification of individual neoplasms. Recent years have seen an intensification of research activities aimed at the extraction of clinically relevant information from patient-derived 'omics' data based on Machine-Learning models. However, a comprehensive training of Machine-Learning models requires sufficiently large numbers of training samples, which are usually not available for rare cancer types. The problem is worsened when individual tissues segregate into different cancer subtypes, as their discrimination would require even more training samples. Methods: Here, we report on a new data-augmentation technique to support the training of Machine-Learning models on ‘omics’ data from pancreatic neuroendocrine neoplasms (panNEN). PanNENs display all properties described above: Only about 2-3% of all pancreatic neoplasms are neuroendocrine and they fall into different subtypes with distinctly different prognosis, which makes the precise classification of such samples both difficult and important for therapy decisions. The approach reconstructs a given transcriptome based on healthy pancreatic cell type signatures and creates a Machine-Learning model that integrates the observed reconstruction error and predicted cell type proportions as features. Results: A benchmark of the deconvolution model predictions with the ground-truth found that the model could efficiently predict the sample grading, disease-related patient survival time, and differentiate between different subtypes in four panNEN and one mixed panNEN and non-pancreatic NENs dataset. We compared the predictive performance of the deconvolution-trained model to that of a model trained directly on the transcriptomic data, under inclusion of the Ki-67 classification gold standard biomarker, and found the performances to be comparable. Conclusion: Our approach serves as a data-augmentation technique to facilitate the training of Machine Learning models for rare cancer types via utilization of data from healthy origin. Additionally, the deconvolution results were clinically interpretable and further research can render the deconvolution approach an effective complementary asset for the clinical classification of neoplasms.

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
EGAD00001006657 Illumina HiSeq 4000 40
Publications Citations
Elevated Flt3L Predicts Long-Term Survival in Patients with High-Grade Gastroenteropancreatic Neuroendocrine Neoplasms.
Cancers (Basel) 13: 2021 4463
2
Transcriptomic Deconvolution of Neuroendocrine Neoplasms Predicts Clinically Relevant Characteristics.
Cancers (Basel) 15: 2023 936
0