Need Help?

Ultra-long sequencing for contiguous haplotype resolution of the human immunoglobulin heavy chain locus

Genetic diversity within the human immunoglobulin heavy chain (IGH) locus influences the expressed antibody repertoire and susceptibility to infectious and autoimmune diseases. However, repetitive sequences and complex structural variation pose significant challenges for large-scale characterization. Here, we introduce a method that combines Oxford Nanopore ultra-long sequencing and adaptive sampling with a bioinformatic pipeline to produce haplotype-resolved, annotated IGH assemblies. Notably, our strategy overcomes prior limitations in phasing resolution, enabling single-contig haplotype assemblies that span the entire IGH locus. We applied this method to four individuals and validated the accuracy of the IGH assemblies using Pacific Biosciences HiFi reads, demonstrating near-complete sequence congruence, detecting only indels with a high degree of confidence. Moreover, when applied to the reference material HG002, our pipeline revealed no base differences and a limited number of indels compared with the Telomere-to-Telomere genome benchmark across the IGH region. Importantly, in the four individuals, our approach uncovered 30 novel alleles and previously uncharacterized large structural variants, including a 120 kb duplication spanning IGHE to IGHA1 and an expanded seven-copy IGHV3-23 gene haplotype. These findings underscore the power of our method to resolve the full complexity of the IGH locus and uncover previously unrecognized variants that may affect immune function and disease susceptibility. Thus, our method provides a strong basis for future immunological research and translational applications.

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
EGAD50000001517 PromethION 4