We obtained bulk RNAseq data of CRC-PDX tumors and performed molecular subtype classification.
scRNA-seq raw data
This dataset includes whole transcriptome data of human stimulated and cultured CD4+ Treg cells (39 samples).
Paired-end RNA-seq of follicular T cell lymphoma for the discovery of fusion transcripts
Variants called from RNA-seq data of meningioma tumors.
The AURORA US Metastatic Breast Cancer project is funded by the Breast Cancer Research Foundation (BCRF) Evelyn H. Lauder Founder's Fund for Metastatic Breast Cancer Research. This multi-center effort conducted within the Translational Breast Cancer Research Consortium (TBCRC) and cancer researchers to better understand the metastatic process through the study of both the primary and metastatic tissue. In the retrospective phase, TBCRC sites submitted matched primary and metastatic tissues and blood from previous collections for piloting the process. Samples were profiled using whole genome DNA sequencing, whole exome DNA sequencing, DNA methylation arrays, and RNA sequencing. The final freeze set for samples with successful nucleic acid extraction and molecular assays included 55 patients with 31 primary tissues and 102 metastases. Twenty patients had tissue collected at autopsy and included 19 with more than one metastasis. Metastases were from 20 different tissue locations with the most common sites being liver, lung, lymph node, and brain. The median age at primary diagnosis was 49 years. The clinical subtypes of the primary tumors were 34% Triple Negative, 30% ER+HER2-, 11% ER+HER2+, and 7% ER-HER2+. Patients received an average of 3 lines of therapy in the metastatic setting. The median overall survival from primary diagnosis was 4.5 years and overall survival from metastatic diagnosis was 2 years.
To identify novel causes of hereditary thrombocytopenia, we performed a genetic association analysis of whole-genome sequencing (WGS) data from 13,037 individuals enrolled in the NIHR BioResource, including 233 cases with isolated thrombocytopenia. We found an association between rare variants in the transcription factor (TF)-encoding gene IKZF5 and thrombocytopenia. We report five causal missense variants in or near IKZF5 zinc fingers (Znfs), of which two occurred de novo and three co-segregated in three pedigrees. A canonical DNA-Znf binding model predicts that three of the variants alter DNA recognition. Expression studies showed that chromatin binding was disrupted in mutant compared to wild-type (WT) IKZF5 and electron microscopy (EM) revealed a reduced quantity of alpha granules in normally sized platelets. Proplatelet formation (PPF) was reduced in megakaryocytes (MKs) from seven cases relative to six controls. Comparison of RNA-seq data from platelets, monocytes, neutrophils and CD4+ T-cells from three cases and 14 healthy controls showed 1,194 differentially expressed genes (DEGs) in platelets but only four DEGs in each of the other blood cell types. In conclusion, IKZF5 is a novel transcriptional regulator of megakaryopoiesis and the eighth transcription factor associated with dominant thrombocytopenia in humans.
The Demographically Diverse Substance Use Disorder Cohorts of Dr. Stanley H. Weiss, which constitute the Epidemiology of the Weiss Cohort Projects, consist of a series of inter-connected projects, building upon a set of cohort projects of various groups, mainly drug users from medication-assisted treatment programs, that Dr. Stanley H. Weiss first developed in the 1980’s plus several newer initiatives, each with an array of collaborators. Beginning in the 1980’s, Dr. Stanley H. Weiss started several long-term studies of persons who inject drugs (PWID) across the United States, ultimately enrolling over 10,000 participants through the early 1990’s with an average age then in their 30’s. About a quarter were enrolled from sites in New Jersey (NJ). These studies included the first testing of PWID for the human immunodeficiency virus (HIV) and the human T-cell lymphotropic viruses (HTLV I and HTLV II). Cumulative past support (initiation thru ~ 1999) for these cohort studies included ~ $20 million from intramural resources from the National Cancer Institute (NCI) and the National Institute on Drug Abuse (NIDA), plus multiple grants and in-kind support from the New Jersey Department of Health (NJDOH) totaling ~ $1 million. The Weiss Cohort Projects include the first large AIDS-era cohorts to include women at high risk for HIV. A high percentage of subjects in these studies are black or Latino. Thus, this is an ethnically diverse US cohort, with a high proportion of women included. These subjects are at high risk of parenteral and sexual infection from both drug use and sexual practices. Samples from other studies conducted by Dr. Weiss, in which detailed interviews were conducted, are included as controls (persons documented by us not to have a history of opioid drug use). As one of our groups of subjects have many persons of Haitian ancestry, we specifically included some Haitians who had never used opioids as controls. Our documentation includes such ancestry. These cohorts demonstrated high rates of HIV and HTLV-II infection in PWIDs, including one study initiated in 1981 with confirmation in the later cohorts. In the first two decades of these studies, among numerous publications was the first study showing a very high rate of hepatitis C infection among PWIDs. An example of how the studies’ long-time horizon proved essential was that it first became possible to test whether a person had ever been infected with hepatitis C virus (HCV), as well as how much HCV was in each person’s blood, many years after the specimens were collected. This allowed HCV amounts in blood to be compared for subjects who had died of liver disease early in the study versus those who survived. Then a sequence of published papers culminated in demonstrating, using a nested case-control design, that a high baseline HCV titer was predictive of early progression to death from end-stage liver failure. Outcomes related to HCV (end stage liver disease and hepatocellular carcinoma) remain under study. In the original cohort studies, the mean age at enrollment was ~ 33 years old, so that those still alive in 2022 are mainly now ~ 60 - 75 years old. Many participants have already died. The tincture of time has led to subjects reaching ages when many more are dying from a wide array of outcomes, including from many chronic diseases (including cancer) as well as from infectious agents (especially HIV, HCV) or drug overdose. Renewed collaboration with local drug treatment programs has led to new field-based studies, including examination of some currently evolving problems among drug users. Dr. Weiss joined the National Institute on Drug Abuse (NIDA) Genetics Consortium (NGC) in 2017, and through the NIDA project officer has had access to NGC contract resources (see below). NIH Certificate of Confidentiality, CC-DA-16-214 (attached) protects these studies. Past arrangements related to data on our subjects leads to restrictions on the use of data emanating from our study, such as potential commercialization and restrictions on whom may access and use these data. NIDA Genetics Consortium (NGC) resources further support these endeavors and will be used as part of the NGC analyses studying the genetics of substance use. Study participants signed informed consent for the information collected from them to be used with no time limit and for biologic specimens collected from them to be used without restriction in future research. Serum samples were collected from participants, and from many also plasma, white blood cells and/or urine samples. About 100,000 vials were stored. All specimens have been continuously preserved at sufficiently cold temperatures to prevent deterioration, and many subjects separated white blood cells were processed and frozen in such a way as to maintain viability. Detailed data from the participants has been accumulated over time, and in general, linkage has been retained in each sub-study in accordance with the consent forms and protocols. For some participants, specimens were collected at multiple times (that is, sequential specimens). Multiple specimens from a single person exist in this database, and efforts at de-duplication remain ongoing. Dr. Weiss should be contacted if an investigator requires unique individuals since: • Multiple phases of enrollment occurred, and as our prospective follow-up continues; Dr. Weiss may identify new instances of multiple enrollment. • Some persons are related to each other. • In general, in this dataset for dbGaP, only a single specimen/record form a given person is included. Advances in laboratory testing techniques now permit innovative new uses for our linked research biospecimen repository. The ongoing focus of an interdisciplinary research program based on these cohorts relates subjects’ diseases, behaviors, medical history, and outcomes with biological and exposure markers. Participants’ use of various substances was ascertained on study enrollments, many serially over time. Quantitative frequency of use data, also sometimes sequential over time, were ascertained. Active ascertainment of outcomes is being conducted, including matching to mortality and cancer databases. Investigators interested in collaborations on specific outcomes (which is not part of this dbGaP dataset) or in the use of our stored specimens are encouraged to contact the principal investigator, Dr. Weiss. The processing of the genomic data was done in conjunction with NIDA, and in accordance with some longstanding data cleaning steps used by NIDA in the NIDA Genetics Consortium (NGC), a group to which we shall be contributing these data for collaborative analyses. Since there is the potential for these steps to introduce certain types of potential biases, we summarize these here. Under contract from NIDA, cryopreserved sera or plasma (-80 C) or cells (in liquid nitrogen) were used, with most stored having been stored for 30 to 40 years in our biorepository. In the case of serum or plasma, in which only (largely) cell-free DNA fragments were available, DNA was extracted and restored prior to amplification. Industry standard DNA amplification techniques were done on all samples prior to genotyping in accord with established protocols of the NIDA Genetics Consortium. Our genotype data were run and processed on the Illumina Infinium OmniExpress_v_1.3 array. This array has 714,238 SNPs, and was designed many years ago. There were 628 SNPs on the array that do not correspond to any chromosome position, and these were removed. Genotype data were submitted by NIDA’s contracted genotyping laboratory in six batches over time to NIDA’s contracted dbGaP data management group, which conducted quality control (QC) analyses. QC analysis included an assessment of batch effects on for five of the six batches. (One of the batches, with only 12 samples, was too small for QC analysis of batch effects.) Standard NIDA Genetic Consortium cleaning was performed. Samples with a call rate <.85 were removed. Only one sample per person was retained. When more than one specimen was genotyped from one subject, only the sample with the higher call rate was retained (provided, of course, that that call rate was ≥ 0.85). We have retained some people we know are related, including some found to have been related through genotyping; the pedigree file describes those relationships. In summary, key cleaning steps include: 1. Using PLINK to check gender discrepancy. 2. Using PREST-PLUS and KING (Kinship-based Inference for GWAS) to check relatedness. 3. Using PEDCHECK and PLINK to check/zero-out Mendelian error. 4. Using PLINK to perform sample QC, SNP QC, along with KING to perform chromosome X and chromosome Y QC. 5. SNP-QC: Batch-effect: 5 Batches were compared (one batch, with few samples, was not). These five batches were compared to each other in all ten possible pairs, one batch vs. another batch, examining SNP allele frequency discrepancies by population (from GRAF), Fisher Exact Allelic test, with the criterion of p<5e-8 for removal. 6. SNP-QC: discordant SNPs in QC duplicates. Compared 25 QC duplicated samples with call rate > 0.95, removed SNPs with 3+ discordance. 7. There were 1,056 SNPs that were monomorphic; these have been retained so they can be included in analyses in which our dbGaP data are combined with those from other cohorts (in the latter of which those SNPs may not be monomorphic). The final cleaned dataset submitted has 8,898 samples and 606,793 SNPs.
Genomic DNA was obtained from M116 peripheral blood sample and was used for targeted deep sequencing (TDS) studies. Barcoded libraries were prepared according to the manufacturer’s instructions, using a probe-based panel (KAPA HyperCap, Roche®) targeting frequently mutated regions of 50 myeloid-related genes. Samples were run on a MiSeq (Illumina®) sequencer for paired-end 2x75 bp reads with a mean coverage of 1000X.
Isolated populations have unique population genetics characteristics that can help boost power in genetic association studies for complex traits. Leveraging these advantageous characteristics requires an in-depth understanding of parameters that have shaped sequence variation in isolates. This study performs a comprehensive investigation of these parameters using low-depth whole genome sequencing (WGS) across multiple isolates.
This dataset consists of SOLiD small RNA-seq of 250 colorectal samples: 100 tumor tissue samples, 100 normal tissue samples (adjacent to tumor sites) and 50 matched control samples of healthy individuals. CSfasta and qual files converted to single fastq files prior to uploading.
43 low-coverage genomes derived from fresh-frozen glioblastoma tumor samples. These genomes have been produced for validation purposes and match the corresponding RRBS and RNA-seq profiles in that DNA and RNA was extracted from the same tumor samples.
37 transcriptomes derived from fresh-frozen glioblastoma tumor samples. These transcriptomes have been produced for validation purposes and match the corresponding RRBS and WGS profiles in that DNA and RNA was extracted from the same tumor samples.
whole genome sequencing data of parent blood samples. Single cell full-length RNA-seq and PBAT-Seq data of in vitro culture D6 to D14 human embryo.
RNA-Seq and ATAC-Seq of iPSC derived neurons under baseline and KCl stimulation conditions from 10 distinct donors, including 5 healthy controls and 5 schizophrenic individuals. scATAC of human post mortem prefrontal cortex from 4 adult individuals including 2 neurotypical individuals and 2 schizophrenic individuals.
180502: RNA-Sequencing data of cocultured matched CRC patient (P4) derived normal fibroblasts (NFs), cancer associated fibroblasts (CAFs) and tumor spheroids. 200503_coculture: RNA-Sequencing data of cocultured CRC patient derived normal fibroblasts (NFs) or cancer associated fibroblasts (CAFs) (P16, P19, P22, P32, P41, P42) and tumor spheroids (HT29). 200503_il1b: RNA-Sequencing data of IL-1β stimulated fibroblasts (NFs and CAFs) Cole: scRNA-sequencing of matched CRC tumour samples and normal tissue counterparts derived from 3 patients. 220501: RNA-Sequencing of FACS sorted IL1R1 high and IL1R1 low CT5.3 CAFs