Assessment of RNA-Seq Sample Preparation Methodology
The goal of this study was two-fold, to determine if common purification techniques have any effect on downstream differential expression analysis and to evaluate combinations of alignment and differential expression software for reliability. To this end, blood was collected from three individuals and pooled during and after extraction. After pooling and mixing was completed, samples were divided into three aliquots for testing. The first aliquot was a control and not purified or concentrated. It was diluted to produce samples of varying concentrations for testing. The second aliquot was diluted to 20 ng/μL and used to test six different variations on the AMPure XP bead purification procedure. The last aliquot was also diluted to varying concentrations of 60, 30, and 9.6 ng/μL and purified using MinElute columns. Samples were submitted for total RNA-Seq library preparation and sequencing. Library preparation was performed using the TruSeq Stranded Total RNA with Ribo-Zero Globin kit (20020612, Illumina Inc.), and 2x150 bp PE sequencing was done on an Illumina NovaSeq 6000 with the S4 reagent kit.
Comparisons were made between methods (AMPure vs. unpurified, AMPure vs. MinElute, MinElute vs. unpurified) to assess the effects of purification method on downstream differential expression. Comparisons were also made within methods using the varying concentrations tested for unpurified samples and for MinElute purified samples to assess the effects of concentration on differential expression. Variations of the AMPure procedure were also compared to assess the effectiveness of the variations tested in comparison to the unmodified procedure.
A subset of samples was selected for use with alignment and differential expression package comparison. Unpurified high concentration samples eluted in RNAse-free water were compared to unpurified high concentration samples eluted in BR5, a buffer from the PAXgene blood miRNA kit, with the expectation that there should be no or very few differentially expressed genes. Unpurified high concentration samples eluted in RNAse-free water were also compared to unpurified low concentration samples also eluted in RNAse-free water, with possibly a small number of differentially expressed genes anticipated. A third dataset of simulated RNA-seq data was created with a known rate of differential expression.
These files were aligned using Bowtie2, HISAT2, kallisto, RSEM, Rsubread, Salmon, and STAR. Results were then analyzed for differential expression using ALDEx2, baySeq, DEGseq, DESeq2, edgeR, limma, NOISeq, PoissonSeq, and SAMseq. Differential expression results of all three comparisons were evaluated to determine which combinations provided the most reliable results for both real and simulated data.
- Type: Case-Control
- Archiver: The database of Genotypes and Phenotypes (dbGaP)