Need Help?

Targeted de novo phasing and long-range assembly by template mutagenesis

Long-range sequencing with low error rate has been challenging. Sequence assembly and phasing usually require a high-quality reference genome for mapping, so working on highly-variable genomic regions or regions with no reference genome information would be difficult. In this study, we describe novel bench protocols and algorithms to obtain ultra-low-error-rate haplotype-phased sequence assemblies of regions 10 KB in length using a short-read sequencing platform that simultaneously solves the above two problems. We accomplish this by imprinting each template strand from a target region with a dense and unique mutation pattern. The mutation process randomly and independently converts ~50% of cytosines to uracils. Short-read sequencing libraries are made from both mutated and unmutated templates. A conservative de Bruijn graph approach seeds an assembly of the mutated templates, which we then extend by mapping paired-end reads. We next partition the template assemblies into two or more haplotypes after using the unmutated sequence library to recover almost all of the mutated bases. The final haplotype is assembled and corrected for residual template mutations and PCR errors. We obtain per-base-error rates below 10 9. We apply this method to a human family, correctly assembling and phasing three genomic intervals, including the highly polymorphic HLA-B gene.

Request Access

To gain access to this dataset, please provide details of your organization and project.

To gain access to this dataset, please provide details of your organization and project.

Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.

Study ID Study Title Study Type
EGAS00001005899 Population Genomics

This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.

ID File Type Size Located in
EGAF00005745448 fastq.gz 30.3 MB
EGAF00005745449 fastq.gz 31.6 MB
EGAF00005745450 fastq.gz 35.9 MB
EGAF00005745451 fastq.gz 37.1 MB
EGAF00005745452 fastq.gz 78.0 MB
EGAF00005745453 fastq.gz 83.7 MB
EGAF00005745454 fastq.gz 148.4 MB
EGAF00005745455 fastq.gz 155.6 MB
EGAF00005745456 fastq.gz 257.7 MB
EGAF00005745457 fastq.gz 269.9 MB
EGAF00005745458 fastq.gz 698.3 MB
EGAF00005745459 fastq.gz 740.7 MB
EGAF00005745460 fastq.gz 40.8 MB
EGAF00005745461 fastq.gz 42.7 MB
EGAF00005745462 fastq.gz 282.2 MB
EGAF00005745463 fastq.gz 295.8 MB
EGAF00005745464 fastq.gz 51.9 MB
EGAF00005745465 fastq.gz 54.0 MB
EGAF00005745466 fastq.gz 73.9 MB
EGAF00005745467 fastq.gz 80.1 MB
EGAF00005745468 fastq.gz 212.1 MB
EGAF00005745469 fastq.gz 228.6 MB
EGAF00005745470 fastq.gz 138.6 MB
EGAF00005745471 fastq.gz 145.1 MB
EGAF00005745472 fastq.gz 56.7 MB
EGAF00005745473 fastq.gz 58.9 MB
EGAF00005745474 fastq.gz 45.4 MB
EGAF00005745475 fastq.gz 47.0 MB
EGAF00005745476 fastq.gz 112.1 MB
EGAF00005745477 fastq.gz 124.5 MB
EGAF00005745478 fastq.gz 83.5 MB
EGAF00005745479 fastq.gz 92.0 MB
EGAF00005745480 fastq.gz 126.5 MB
EGAF00005745481 fastq.gz 136.9 MB
EGAF00005745482 fastq.gz 53.8 MB
EGAF00005745483 fastq.gz 56.1 MB
EGAF00005745484 fastq.gz 49.7 MB
EGAF00005745485 fastq.gz 51.5 MB
EGAF00005745486 fastq.gz 212.5 MB
EGAF00005745487 fastq.gz 223.4 MB
EGAF00005745488 fastq.gz 464.2 MB
EGAF00005745489 fastq.gz 480.2 MB
EGAF00005745490 fastq.gz 171.1 MB
EGAF00005745491 fastq.gz 182.7 MB
EGAF00005745492 fastq.gz 84.5 MB
EGAF00005745493 fastq.gz 93.3 MB
EGAF00005745494 fastq.gz 80.3 MB
EGAF00005745495 fastq.gz 88.1 MB
48 Files (7.4 GB)