Study

Inferring expressed genes by whole-genome sequencing of plasma DNA

Study ID Alternative Stable ID Type
EGAS00001001754 Other

Study Description

The analysis of cell-free DNA (cfDNA) in plasma represents a rapidly advancing field in medicine. cfDNA consists predominantly of nucleosome-protected DNA shed into the bloodstream by cells undergoing apoptosis. We performed whole-genome sequencing (WGS) of plasma DNA and identified two discrete regions at transcription start sites (TSS) where the nucleosome occupancy results in different read-depth coverage patterns in expressed and silent genes. By employing machine learning for gene classification we found that the plasma DNA read depth patterns from healthy donors reflected the expression signature of hematopoietic cells. In cancer patients with metastatic disease, we were able to classify expressed cancer driver genes in regions with somatic copy number gains with high accuracy. We could even determine the expressed isoform of genes with several TSSs as confirmed by RNA-Seq of the matching primary tumor. Our analyses provide functional information about the cells releasing their DNA into the circulation.

Study Datasets 4 datasets.

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
EGAD00001002215
Low coverage whole genome sequencing plasma DNA from 50 male, 54 female non-cancer donors. For the analysis of nucleosomal positioning all data from the non-cancer controls were merged. Furthermore, two patients with metastasized breast cancer were sequenced on a NextSeq with higher depth.
Illumina MiSeq,NextSeq 550 108
EGAD00001002216
RNA-Seq on an Ion Torrent Proton of corresponding tumor material of two metastasized breast cancer patients (Breast7, Breast13).
Ion Torrent Proton 2
EGAD00001002217
Merged file of low-coverage WGS from 179 plasma DNA samples from non-cancer controls and cancer patients for assessment of size distribution of plasma nuclear DNA fragments.
Illumina MiSeq 1
EGAD00001002254
Single-end sequencing data (trimmed to 60bp) of 104 plasma samples from donors without tumors (male=50; female=54) were merged and used to establish coverage profiles around the TSS and to establish a gene expression prediction algorithm. Dataset includes merged alignements of low coverage whole genome sequencing from plasma DNA from 50 male, 54 female non-cancer donors. Furthermore, 2 patients with metastasized breast cancer were sequenced on a NextSeq with higher depth.
Illumina MiSeq 3

Who archives the data?

Publications

Citations

Retrieving...
Retrieving...
Retrieving...
Retrieving...