CaBagE: a Cas9-Based Background Elimination Strategy for Targeted, Long-Read DNA Sequencing
While repeat-expansion polymorphisms are known to underlie several developmental and neurological disorders, technological limitations have inhibited the ability to resolve accurate genotypes for these variants. A pertinent example is the (CCCCGG)n repeat expansion in C9orf72 that segregates with up to 40% of familial amyotrophic lateral sclerosis (ALS) cases. There is a clear relationship between this repeat expansion and neurodegeneration. However, widely-used short-read sequencing is ill-suited to characterize the C9orf72 repeat expansion due to polymerase slippage over low complexity sequences and the inherently limited mapping/haplotype resolution of redundant short sequences. Consequently, researchers and physicians must rely on crude variant characterization methods (e.g. Southern blots), which limit direct genotype-phenotype associations and create challenges for the diagnosis of disease in patients with atypical clinical presentation.
In contrast, ultra-long (e.g., 10-150kb) sequencing reads generated by Oxford Nanopore Technologies (ONT) enable direct measurement of loci containing complex structures without being subject to amplification or alignment bias. We are therefore developed a molecular and computational framework for targeted sequencing and quantification of pathogenic repeat expansions by combining Cas9 and amplification-free sequence capture methods with Nanopore long-read sequencing technology. First, extracted high molecular weight DNA is bound by Cas9 on either side of a DNA target. Bound Cas9 protects the target DNA from challenge by processive exonucleases. Unprotected (off-target) DNA is degraded with exonucleases and the remaining sample is sequenced using the ONT MinION sequencer. Then, base-called reads are aligned to the target locus and repeat copy number genotypes are estimated using a Gaussian Mixture Model.
We applied this molecular and computational framework to 2 ALS patients with C9orf72 repeat expansions with biospecimens available from the NINDS collection from Coriell. Genotype estimates produced with CaBagE were commensurate with genotypes derived from repeat-primed PCR for each individual.