CNA calls generated using the MuTect-Battenberg-PhyloWGS from the CPC-GENE Subclonal Heterogeneity study
Microfluidic direct library preparation (DLP) single-cell whole-genome BAM files for near-diploid immortalized lymphoblastoid cell line GM18507.
SNV calls generated using the MuTect-Battenberg-PhyloWGS from the CPC-GENE Subclonal Heterogeneity study
Whole exome sequencing BAM files for samples from the BRIDGE Consortium with pathogenic or likely pathogenic variants on genes linked to bleeding or platelet disorders.
The cohort used in this study includes 36 African American (17 female, 19 male) and 36 White American (19 female, 17 male) individuals. The total cohort has a median age of 32 years, with a range of 20-64 years. The African American cohort has a median age of 34, with a range of 21-52 years. The White American cohort has a median age of 31, with a range of 20-64 years. Primary dermal fibroblast lines were derived from skin biopsies obtained from adult individuals at the NIEHS under institutional review board approved protocol human subjects 10-E-0063, "Sample Collection Registry for Quality Control of Biological and Environmental Specimens and Assay Development and Testing protocol" (ClinicalTrials.gov #NCT01087307). All participants gave written informed consent for tissue donation. Donor sex, age, and ancestry were voluntarily self-reported. Fibroblast cells were reprogrammed via lentiviral transduction using six transcription factors contained in three plasmids (ADDGENE/PSIN4-EF2-N2L, ADDGENE/PSIN4-EF2-O2S and ADDGENE/PSIN4-CMV-K2M). Reprogramming efficiency was determined by alkaline phosphatase staining of triplicate 10cm reprogrammed dishes containing colonies, which were then scanned. The saved images were analyzed with ImageJ 1.51h (Wayne Rasband, National Institutes of Health, USA) to count colonies, using color threshold adjusted and binary converted images of each dish. Triplicate plates were averaged and reported as colony counts or percent reprogramming efficiency ((# of colonies/250,000) x100). The African American cohort had reprogramming efficiencies ranging from 0.06-1.37%, with a median of 0.655%. The White American cohort had reprogramming efficiencies ranging from 0.02-1.13%, with a median of 0.455%. Our goal was to define transcriptomic heterogeneity that could be contributing to differences in reprogramming efficiency between individuals and between groups. Total RNA was obtained from dermal fibroblasts and matched iPSCs. For each sample, 500 ng total RNA was used as input for preparation of whole transcriptome rRNA depleted libraries. An adapter-ligated library was prepared with the KAPA HyperPrep Kit (KAPA Biosystems, Wilmington, MA) using Bioo Scientific NEXTflex™ DNA Barcoded Adapters (Bioo Scientific, Austin, TX, USA) according to KAPA-provided protocol. Sequencing was performed using an Illumina HiSeq 2500 following Illumina-provided protocols for 2x150 bp paired-end sequencing. Each transcriptome was sequenced to a target depth of 125 million reads. The following mean raw reads were obtained: African American dermal fibroblasts= 129,571,450; White American dermal fibroblasts= 131,939,505; African American iPSCs= 132,501,335; White American iPSCs= 134,394,164. Raw reads were aligned to hg19 using the STAR alignment tool (https://github.com/alexdobin/STAR). The following mean aligned reads were obtained: African American dermal fibroblasts= 123,315,343; White American dermal fibroblasts= 125,178,035; African American iPSCs= 123,148,312; White American iPSCs= 123,886,992. Reprinted from L. C. Mackey et al., Epigenetic Enzymes, Age, and Ancestry Regulate the Efficiency of Human iPSC Reprogramming. Stem Cells 36, 1697-1708 (2018), with permission from Wiley. Reprinted from L. S. Bisogno et al., Ancestry-dependent gene expression correlates with reprogramming to pluripotency and multiple dynamic biological processes. Science Advances 6 (47) (2020) (PMID: 33219026), with permission from AAAS.
Genetic diversity within the human immunoglobulin heavy chain (IGH) locus influences the expressed antibody repertoire and susceptibility to infectious and autoimmune diseases. However, repetitive sequences and complex structural variation pose significant challenges for large-scale characterization. Here, we introduce a method that combines Oxford Nanopore ultra-long sequencing and adaptive sampling with a bioinformatic pipeline to produce haplotype-resolved, annotated IGH assemblies. Notably, our strategy overcomes prior limitations in phasing resolution, enabling single-contig haplotype assemblies that span the entire IGH locus. We applied this method to four individuals and validated the accuracy of the IGH assemblies using Pacific Biosciences HiFi reads, demonstrating near-complete sequence congruence, detecting only indels with a high degree of confidence. Moreover, when applied to the reference material HG002, our pipeline revealed no base differences and a limited number of indels compared with the Telomere-to-Telomere genome benchmark across the IGH region. Importantly, in the four individuals, our approach uncovered 30 novel alleles and previously uncharacterized large structural variants, including a 120 kb duplication spanning IGHE to IGHA1 and an expanded seven-copy IGHV3-23 gene haplotype. These findings underscore the power of our method to resolve the full complexity of the IGH locus and uncover previously unrecognized variants that may affect immune function and disease susceptibility. Thus, our method provides a strong basis for future immunological research and translational applications.
WGBS data of whole blood samples from smoking and non-smoking mothers and their children at gestation/birth and follow-up years.
NanoString raw data for a noeadjuvant combination PD-L1 plus CTLA-4 blockade trial on patients with cisplatin-ineligible operable urothelial carcinoma. All samples were FFPE tumor samples. Raw probe count data (.RCC files) were generated from nCounter Digital Analyzer (4.0.0.3).
Genotype data obtained using the coreExome Illumina SNP chip array for all the individuals included in the study of gene expression regulation in human primary regulatory CD4+ T cells (Tregs)
The compressed file contains plink format file for the Affymetrix Human Origins SNP array data of 260 individuals generated and analyzed in Liu et al 2020 study of 22 ethnolinguistic groups in Vietnam.