DAC for data acquired during the Down Syndrome acute lymphoblastic leukemia project. The project was performed with clinical sam

Dac ID Contact Person Email Access Information
EGAC00001000644 Gianni Cazzaniga gianni [dot] cazzaniga [at] asst-monza [dot] it

This DAC controls 2 datasets:

Dataset ID Description Technology Samples
EGAD00001003280 NextSeq 550;ILLUMINA 16
EGAD00001003275 Targeted resequencing of samples was done with TruSeq custom amplicon low input kit (TSCA-LI, Illumina). The oligo capture probes were designed to include a prefix of 8 random nucleotides at the 5 end of each probe. The assay is designed such that each targeted locus is annealed with two probes, resulting in amplicons tagged with unique molecular identifiers (UMI) (22) of 16 bases. Raw FASTQ sequencing files were processed as following: (a) The first 8 bases were trimmed from each read and recorded with the corresponding base quality scores (BQ) in the attribute field. (b) Reads were aligned with BWA. (c) First round of PCR duplicate cleaning was performed with picard tools markDuplicates using the parameters BARCODE_TAG=BC TAGGING_POLICY=All REMOVE_DUPLICATES=true (d) Since in the previous step only duplicate reads with identical UMIs were removed, a second pass of filtering was done. Reads with identical mapping were considered unique only if their corresponding UMIs were different in at least 3 positions (i.e., UMI edit distance > 2). (e) Paired-end read pairs overlapping genomic positions were clipped to avoid overestimation of the sequencing coverage using bamUtils clipOverlap. NextSeq 550;ILLUMINA 74