Need Help?

This study generated Oxford Nanopore long read sequencing data of cancer cell line mixtures for validating long-read variant calls in cancer genomics

Long-read sequencing (LRS) improves genome alignment and facilitates resolving variants in complex genomic regions, making it a promising approach for cancer variant detection and biomarker discovery. Here, we evaluate the performance of LRS to detect somatic variants across different tumour purities and sequencing depths by comparing to somatic variants from short-read sequencing. We generated experimental mixtures of cancer cell lines and matched normal cell lines to simulate 10 tumour purities (ranging from 0 to 100% tumour content). This resulted in 22 samples which were sequenced using whole genome LRS to a targeted depth of 60x and somatic variants were identified. We downsampled sequencing data to explore the optimal read depth for somatic variant detection. Our results show that long-read variant calling tools achieve recall rates comparable to short-read gold standards. While a tumour sequencing depth of 30x to 60× is generally sufficient for detecting common variants, particularly structural variants (SVs), sequencing the matched normal sample at adequate depth is crucial for accuracy. Notably, we found variants unique to LRS that may represent real and previously undetected events. This study highlights key factors for optimising cancer genome sequencing with LRS and provides a comprehensive dataset of cell line mixtures for future research and tool development.

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
EGAD00001015628 PromethION 21