Need Help?

Profiling Genome-Wide Circulating ncRNAs for the Early Detection of Lung Cancer

Lung cancer remains the leading cause of cancer-related mortality worldwide, with non-small cell lung cancer (NSCLC) accounting for approximately 85% of all cases. Although imaging techniques such as low-dose computed tomography (LDCT), combined with tissue biopsies, have improved early detection rates, current diagnostic approaches are often limited by high false-positive rates, invasiveness, and radiation exposure, which lead to misdiagnosis and increased patient burden. Therefore, there is an urgent need for minimally invasive, accurate, and safe diagnostic biomarkers, particularly those derived from liquid biopsies such as blood-based exosomal RNA.

This dataset specifically focuses on plasma-derived exosomal small non-coding RNAs (sncRNAs).A total of 233 de-identified plasma samples were obtained from Rush University Medical Center, including 116 NSCLC patients and 117 non-cancer controls. Exosomes were isolated from plasma, RNA was extracted, and small RNA sequencing was performed using the Illumina NextSeq 500 platform. Data available through dbGaP include individual-level raw sequencing files (FASTQ) and basic phenotype annotations of the plasma cohort.

A diagnostic model was initially developed using small RNA-seq data from 1,446 tissue samples obtained from TCGA and GEO, identifying a robust sncRNA signature that distinguished NSCLC from non-cancer samples with an AUC of 0.90 in hold-out tissue validation. This signature was subsequently validated in the plasma exosome cohort, achieving an AUC of 0.84 in independent validation. The model demonstrated consistent diagnostic performance across subgroups stratified by age, sex, smoking history, cancer stage, and NSCLC subtype. Survival and functional analyses further supported the clinical and biological relevance of the identified sncRNAs.

It is hoped that this resource will contribute to a better understanding of the sncRNA expression landscape in NSCLC and promote the development of non-invasive biomarkers for early cancer detection. These data may be applied to the investigation of transcriptomic and other risk factors, provide insight into cancer screening, tumor progression mechanisms, and prognosis prediction, and ultimately support precision medicine for NSCLC.