Datasets used in the article "The genetic and linguistic admixture histories of the islands of Cabo Verde" by Laurent R et al. eLife 2023 (DOI: https://doi.org/10.7554/eLife.79827 - URL: https://elifesciences.org/articles/79827) File name "eGAdeposit_233CaboVerde_SampleInfo_FINAL_01062022.txt" Column 1 corresponds to individual alphanumeric codes as in the "eGAdeposit_233CaboVerde_GenotypeFile_FINAL_01062022.vcf" genotype file Column 2 corresponds to individual's biological sex as per genetic inference Column 3 corresponds to individual's self-reported age in years Column 4 corresponds to individual's self-reported cumulated number of years spent in academic or professional education
244 infected single-cell alveolar bam files, 48 empty well bam files, and 52 RNA sequencing of amplicons (4 SARS-CoV-2 variants with 12 batches and 4 viral variants pool samples). 244 alveolar single cells were captured over 12 experimental batches and experimental condition is written in metadata uploaded as "infected_cells_final_revision.csv". on github (https://github.com/twkim-0510/SARS-CoV-2_viral_competition). Each bam file name corresponds to the sample_name column of the metadata.
Whole genome sequencing data from four affected and one unaffected individuals from two families with familial adult myoclonic epilepsy, one of Sri Lankan origin and one of Indian origin. BAM files aligned to hg19 reference genome.
We developed a method to measure the origination rates of target mutations of choice and applied it to the HbS and nearby mutations in the human beta-globin (HBB) gene, as well as to the equivalent mutations in the nearly identical delta-globin (HBD) gene in sperm cells from African and European donors. Specifically, after extracting DNA from the sperm of the donors, each sample is enriched for the HbS mutation and nearby mutations by the restriction enzyme Bsu36I, which cleaves the wild-type (WT) sequence CCTGAGG at positions 16–22 of HBB and the homologous positions of HBD while leaving the HbS mutant and other mutants in these positions intact. Next, unique barcodes are attached to the DNA fragments from these Bsu36I-treated samples in order to reduce error by consensus sequencing of copies originating from the same original fragment. To determine how many target WT sequences have been removed (scanned) by Bsu36I digestion and hence to be able to calculate the de novo mutation rate, we run a second reaction in parallel for each donor. This reaction involves the same steps as the former reaction except for digestion by Bsu36I (Bsu36I-untreated). Spiking into the Bsu36I-treated and untreated samples known amounts of mock HBB-like DNA molecules that are fully resistant to Bsu36I digestion allows to calculate in the sequence analysis stage the Bsu36I-enrichment score based on the change in their frequency between the two treatments, which in turn is used to calculate the number of scanned WT sequences. Our results show significant correspondence between de novo mutation rates and past observations of alleles in carriers, showing that mutation rates vary substantially in a mutation-specific manner that contributes to the site frequency spectrum. We also found that the overall point mutation rate is significantly higher in Africans than Europeans in the HBB region studied. Finally, the rate of the 20A>T mutation, called the “HbS mutation” when it appears in HBB, is significantly higher than expected from the genome-wide average for this mutation type. Nine instances were observed in the African HBB ROI, where it is of adaptive significance, representing at least three independent originations, and no instances were observed in the European HBB ROI or in the European or African HBD ROI.12 consented subjects; samples from two subjects were pooled
Leprosy GWAS (1) Leprosy is a chronic infectious disease caused by Mycobacterium leprae (M. leprae). Due to M. leprae's narrow host range and an inability to be cultured in vitro, the biological investigation of this disease has been difficult. Host genetic factors have been suggested to play an important role in disease development, but few have been identified. In this study, we attempted to identify the host genetic factors by performing a two-stage genome-wide association study (GWAS) in Chinese population. The initial genome-wide scan was done by genotyping 706 patients and 1225 controls using the Illumina HumanHap610 BeadChip, and the follow-up study was performed by genotyping 93 SNPs in three independent samples consisting of 3254 cases and 5955 controls. We identified significant association (P < 10-10) within six genes CCDC122 (13q14), C13orf31 (13q14), NOD2(16q12), TNFSF15 (9q32), HLA-DR (6p21) and RIPK2 (8q21), and suggestive association (P = 5.68x10-6) within LRRK2 (12q12). We also revealed suggestive evidence for C13orf31, LRRK2, NOD2 and RIPK2 to show stronger association in the multibacillary form than the paucibacillary form of leprosy. Our findings highlight the importance of the innate immune response, particularly NOD2-mediated signaling, in leprosy and suggests a new therapeutic target for leprosy. DHS GWAS Dapsone (DDS), as both an antibiotic and an anti-inflammatory agent, has been widely used for preventing and treating pathogen-caused infectious diseases and chronic inflammatory diseases. Currently, about 0.5-3.6% of individuals treated with DDS develop severe dapsone hypersensitivity syndrome (DHS) and the mortality rate is up to about 11-13%. However, until now, no tests are available to predict the risk of DHS. To identify the genetic risk factors of DHS, we performed a two-stage GWAS in Chinese population. In this study, the initial genome-wide scan was done by genotyping 39 DHS cases and 833 controls using Illumina Human 660W-Quad Beadchips and imputed human leukocyte antigen (HLA) molecules. Validation was performed by genotyping 24 SNPs using the Sequenom MassARRAY platform in additional 31 DHS cases and 1,089 controls through genotyping and 32 four-digit HLA-B alleles in an independent series of 37 DHS cases and 201 controls through Roche 454 sequencing. We identified significant association (OR =6.18, P = 3.84x10-13) with SNP rs2844573, which located between the HLA-B and MICA loci. HLA-B*13:01 was confirmed to be a strong risk factor for DHS (OR = 20.53, P = 6.84x10-25) and responsible for the association at rs2844573. The presence of HLA-B*13:01 had a sensitivity of 85.5% and specificity of 85.7% as a predictor for DHS, and its absence can reduce the risk by 7 fold (from 1.4% to 0.2%). HLA-B*13:01 is strongly associated with the development of DHS and can be used as a risk predictor of DHS in the individuals of Chinese and other Asian populations. Leprosy GWAS (2) This is a three-stage GWAS of leprosy in the Chinese population. The genome-wide discovery analysis involved two independent data sets: the new unpublished data set by genotyping 842 leprosy patients and 925 controls from northern and southern China using the Human660w-quad beadchips and a previously published GWAS data set of 706 leprosy cases, 1,225 healthy controls and 4,362 individuals with immune-related diseases as population controls from northern China of Chinese Han descent. Further validation studies were performed in two stages in a total of 6,765 cases and 9,505 controls from Chinese Han and minority descent in multiple areas of China, similar to the discovery stage. From this study, we discovered six new susceptibility loci with a combined P-value from discovery and validation stage surpassing genome-wide significance, including BATF3 at 1q32.3, CDH18 at 5p14.3, DEC1 at 9q32, EGR2 at 10q21.3, CCDC88B at 11q13.1 and CIITA at 16p13.13. Our current study has advanced the genetic understanding of leprosy by substantially increasing the number of confirmed genetic susceptibility loci. Leprosy GWAS (3) To discover additional leprosy susceptibility loci, we carried out a large-scale three-stage GWAS analysis of leprosy in Chinese population. The genome-wide discovery analysis (Stage 1) involved a new GWAS data set of 1,197 leprosy cases and 1,426 controls from northern and southern China conducted by using Illumina Omni Zhonghua chips with 900,015 single-nucleotide polymorphisms (SNPs), two published GWAS dataset (GWAS2 data set including 706 leprosy cases, 1,225 healthy controls genotyped by Illumina HumanHap610 beadchips, GWAS3 data set including 842 leprosy patients and 925 controls from northern and southern China genotyped by the Human660w-quad beadchips). Further validation studies were performed in two stages in a total of 5,413 cases and 9,771 controls from Chinese Han descent in multiple areas of China. In this study, we identified four novel associations at genome-wide significance (P < 5 x 10-8), all of which can indicate candidate genes within the susceptibility loci, SYN2 (3p25.2), BBS9 (7p14.3), CTSB (8p23.1) and MED30 (8q24.11), through a differential gene expression and eQTL analysis. Altogether, these findings have provided new insight and significantly expanded our understanding of the genetic basis of leprosy. Leprosy GWAS (4) In this study, we attempted to systematically investigate the contribution of protein-coding variants to leprosy susceptibility by performing a three-stage genome-wide association study (GWAS) of protein-coding variants in Chinese population. The initial genome-wide scan was done by genotyping 1,670 persons affected by leprosy and 2,321 controls using the Illumina Infinium Human Exome Beadchips (v1.0).The validation study was performed by genotyping 39 SNPs in an additional 3,169 leprosy patients and 9,814 healthy controls from the northern region of China, and the replication study was performed by genotyping eight SNPs in three independent samples from the southern regions of China consisting of 2,231 cases and 2,266 controls. We identified significant association (P < 1.23 x 10-6) within seven genes FLG (Gene ID: 2312), IL23R (Gene ID: 149233), CARD9 (Gene ID: 64170), NCKIPSD (Gene ID: 51517), TYK2 (Gene ID: 7297), SLC29A3 (Gene ID: 55315), IL27 (Gene ID: 246778). Our findings discover novel involvement of skin barrier and endocytosis/phagocytosis/autophagy, besides known innate and adaptive immunity, in the pathogenesis of leprosy, highlight the merits of protein coding variant studies for complex diseases. Here, the summary statistics from the five genome-wide association analyses were published.
The transcription factor FOXR2 is the universal driver of childhood central nervous system neuroblastoma, FOXR2 activated (NB-FOXR2). NB-FOXR2 tumors arise exclusively in the brain hemispheres, and despite morphological similarities to other pediatric brain tumors, they are a molecularly distinct entity based on DNA methylation profiling. The cell-of-origin is unknown. Here, we profiled a cohort of rare NB-FOXR2 tumors by bulk and single-cell transcriptomics. Through systematic comparative analyses, we delineate tumor transcriptional states and candidate cell-of-origin. More broadly, we demonstrate systematic molecular profiling of childhood cancers to orient oncogenic targeting for in vivo modeling, a critical resource for the study of rare tumors and development of therapeutics.
To perform genetic and gene expression analyses on mucinous ovarian tumours to determine a progression model, cell of origin and novel therapeutic targets.
There are no effective medical therapies for meningiomas, partly due to our limited understanding of their cellular origin and composition For this purpose, we generated data to make a comprehensive reference single cell and spatial transcriptomic atlas of human fetal meninges. Samples of meninges from human embryos of post-conceptional week (PCW) 5.0-13.0 were dissected and subject to 10Xgenomics chromium single cell RNA sequencing with the purpose of identifying cell types and gene expression programs for normal meninges development, Furthermore, this dataset should help discover cells of origin for meningiomas.
Neuroblastoma is a pediatric tumor of the developing sympathetic nervous system. However, the cellular origin of neuroblastoma remains to be defined. Here, we study single-cell transcriptomes of neuroblastomas and normal human developing adrenal glands at various stages of embryonic and fetal development. We define normal differentiation trajectories during adrenal medullary development and identify the cellular origin of neuroblastoma. Importantly, adrenergic and mesenchymal neuroblastomas with varying clinical phenotypes match different temporal states along normal differentiation trajectories, with the degree of differentiation corresponding to clinical prognosis.
The GenSalt study aims to identify genes which interact with dietary sodium and potassium intake to influence blood pressure in Han Chinese participants from rural north China. Whole genome sequencing will be conducted among 1,860 participants of the Genetic Epidemiology Network of Salt Sensitivity (GenSalt) Study. We will work in collaboration with participating TOPMed studies to identify novel common, low-frequency and rare variants associated with an array of cardiometabolic phenotypes. In addition, we will explore the relation of low-frequency and rare variants with salt-sensitivity among GenSalt study participants.