The Demographically Diverse Substance Use Disorder Cohorts of Dr. Stanley H. Weiss, which constitute the Epidemiology of the Weiss Cohort Projects, consist of a series of inter-connected projects, building upon a set of cohort projects of various groups, mainly drug users from medication-assisted treatment programs, that Dr. Stanley H. Weiss first developed in the 1980’s plus several newer initiatives, each with an array of collaborators. Beginning in the 1980’s, Dr. Stanley H. Weiss started several long-term studies of persons who inject drugs (PWID) across the United States, ultimately enrolling over 10,000 participants through the early 1990’s with an average age then in their 30’s. About a quarter were enrolled from sites in New Jersey (NJ). These studies included the first testing of PWID for the human immunodeficiency virus (HIV) and the human T-cell lymphotropic viruses (HTLV I and HTLV II). Cumulative past support (initiation thru ~ 1999) for these cohort studies included ~ $20 million from intramural resources from the National Cancer Institute (NCI) and the National Institute on Drug Abuse (NIDA), plus multiple grants and in-kind support from the New Jersey Department of Health (NJDOH) totaling ~ $1 million. The Weiss Cohort Projects include the first large AIDS-era cohorts to include women at high risk for HIV. A high percentage of subjects in these studies are black or Latino. Thus, this is an ethnically diverse US cohort, with a high proportion of women included. These subjects are at high risk of parenteral and sexual infection from both drug use and sexual practices. Samples from other studies conducted by Dr. Weiss, in which detailed interviews were conducted, are included as controls (persons documented by us not to have a history of opioid drug use). As one of our groups of subjects have many persons of Haitian ancestry, we specifically included some Haitians who had never used opioids as controls. Our documentation includes such ancestry. These cohorts demonstrated high rates of HIV and HTLV-II infection in PWIDs, including one study initiated in 1981 with confirmation in the later cohorts. In the first two decades of these studies, among numerous publications was the first study showing a very high rate of hepatitis C infection among PWIDs. An example of how the studies’ long-time horizon proved essential was that it first became possible to test whether a person had ever been infected with hepatitis C virus (HCV), as well as how much HCV was in each person’s blood, many years after the specimens were collected. This allowed HCV amounts in blood to be compared for subjects who had died of liver disease early in the study versus those who survived. Then a sequence of published papers culminated in demonstrating, using a nested case-control design, that a high baseline HCV titer was predictive of early progression to death from end-stage liver failure. Outcomes related to HCV (end stage liver disease and hepatocellular carcinoma) remain under study. In the original cohort studies, the mean age at enrollment was ~ 33 years old, so that those still alive in 2022 are mainly now ~ 60 - 75 years old. Many participants have already died. The tincture of time has led to subjects reaching ages when many more are dying from a wide array of outcomes, including from many chronic diseases (including cancer) as well as from infectious agents (especially HIV, HCV) or drug overdose. Renewed collaboration with local drug treatment programs has led to new field-based studies, including examination of some currently evolving problems among drug users. Dr. Weiss joined the National Institute on Drug Abuse (NIDA) Genetics Consortium (NGC) in 2017, and through the NIDA project officer has had access to NGC contract resources (see below). NIH Certificate of Confidentiality, CC-DA-16-214 (attached) protects these studies. Past arrangements related to data on our subjects leads to restrictions on the use of data emanating from our study, such as potential commercialization and restrictions on whom may access and use these data. NIDA Genetics Consortium (NGC) resources further support these endeavors and will be used as part of the NGC analyses studying the genetics of substance use. Study participants signed informed consent for the information collected from them to be used with no time limit and for biologic specimens collected from them to be used without restriction in future research. Serum samples were collected from participants, and from many also plasma, white blood cells and/or urine samples. About 100,000 vials were stored. All specimens have been continuously preserved at sufficiently cold temperatures to prevent deterioration, and many subjects separated white blood cells were processed and frozen in such a way as to maintain viability. Detailed data from the participants has been accumulated over time, and in general, linkage has been retained in each sub-study in accordance with the consent forms and protocols. For some participants, specimens were collected at multiple times (that is, sequential specimens). Multiple specimens from a single person exist in this database, and efforts at de-duplication remain ongoing. Dr. Weiss should be contacted if an investigator requires unique individuals since: • Multiple phases of enrollment occurred, and as our prospective follow-up continues; Dr. Weiss may identify new instances of multiple enrollment. • Some persons are related to each other. • In general, in this dataset for dbGaP, only a single specimen/record form a given person is included. Advances in laboratory testing techniques now permit innovative new uses for our linked research biospecimen repository. The ongoing focus of an interdisciplinary research program based on these cohorts relates subjects’ diseases, behaviors, medical history, and outcomes with biological and exposure markers. Participants’ use of various substances was ascertained on study enrollments, many serially over time. Quantitative frequency of use data, also sometimes sequential over time, were ascertained. Active ascertainment of outcomes is being conducted, including matching to mortality and cancer databases. Investigators interested in collaborations on specific outcomes (which is not part of this dbGaP dataset) or in the use of our stored specimens are encouraged to contact the principal investigator, Dr. Weiss. The processing of the genomic data was done in conjunction with NIDA, and in accordance with some longstanding data cleaning steps used by NIDA in the NIDA Genetics Consortium (NGC), a group to which we shall be contributing these data for collaborative analyses. Since there is the potential for these steps to introduce certain types of potential biases, we summarize these here. Under contract from NIDA, cryopreserved sera or plasma (-80 C) or cells (in liquid nitrogen) were used, with most stored having been stored for 30 to 40 years in our biorepository. In the case of serum or plasma, in which only (largely) cell-free DNA fragments were available, DNA was extracted and restored prior to amplification. Industry standard DNA amplification techniques were done on all samples prior to genotyping in accord with established protocols of the NIDA Genetics Consortium. Our genotype data were run and processed on the Illumina Infinium OmniExpress_v_1.3 array. This array has 714,238 SNPs, and was designed many years ago. There were 628 SNPs on the array that do not correspond to any chromosome position, and these were removed. Genotype data were submitted by NIDA’s contracted genotyping laboratory in six batches over time to NIDA’s contracted dbGaP data management group, which conducted quality control (QC) analyses. QC analysis included an assessment of batch effects on for five of the six batches. (One of the batches, with only 12 samples, was too small for QC analysis of batch effects.) Standard NIDA Genetic Consortium cleaning was performed. Samples with a call rate <.85 were removed. Only one sample per person was retained. When more than one specimen was genotyped from one subject, only the sample with the higher call rate was retained (provided, of course, that that call rate was ≥ 0.85). We have retained some people we know are related, including some found to have been related through genotyping; the pedigree file describes those relationships. In summary, key cleaning steps include: 1. Using PLINK to check gender discrepancy. 2. Using PREST-PLUS and KING (Kinship-based Inference for GWAS) to check relatedness. 3. Using PEDCHECK and PLINK to check/zero-out Mendelian error. 4. Using PLINK to perform sample QC, SNP QC, along with KING to perform chromosome X and chromosome Y QC. 5. SNP-QC: Batch-effect: 5 Batches were compared (one batch, with few samples, was not). These five batches were compared to each other in all ten possible pairs, one batch vs. another batch, examining SNP allele frequency discrepancies by population (from GRAF), Fisher Exact Allelic test, with the criterion of p<5e-8 for removal. 6. SNP-QC: discordant SNPs in QC duplicates. Compared 25 QC duplicated samples with call rate > 0.95, removed SNPs with 3+ discordance. 7. There were 1,056 SNPs that were monomorphic; these have been retained so they can be included in analyses in which our dbGaP data are combined with those from other cohorts (in the latter of which those SNPs may not be monomorphic). The final cleaned dataset submitted has 8,898 samples and 606,793 SNPs.
This postmortem study examines molecular, genetic and epigenetic signatures in the brains of hundreds of subjects with or without mental disorders conducted by the DIRP NIMH Human Brain Collection Core (HBCC). The brain tissues are obtained under protocols approved by the CNS IRB (NCT00001260), with the permission of the next-of-kin (NOK) through the Offices of the Chief Medical Examiners (MEOs) in the District of Columbia, Northern Virginia and Central Virginia. Additional samples were obtained from the University of Maryland Brain and Tissue Bank (contracts NO1-HD-4-3368 and NO1-HD-4-3383) (http://www.medschool.umaryland.edu/btbank/ and the Stanley Medical Research Institute: http://www.stanleyresearch.org/brain-research/). Clinical characterization, neuropathological screening, toxicological analyses, and dissections of various brain regions were performed as previously described (Lipska et al. 2006; PMID: 16997002). All patients met DSM-IV criteria for a lifetime Axis I diagnosis of psychiatric disorders including schizophrenia or schizoaffective disorder, bipolar disorder and major depression. Controls had no history of psychiatric diagnoses or addictions. SNP array: Array-based genotyping was performed on most samples published in this collection. The number of SNPs assayed via Illumina chips varied between 650,000 and 5 Million. Cerebellar tissue was generally used for genotyping studies. # Diagnosis SNP Array 1 Anxiety Disorder 1 2 Autism Spectrum Disorder 13 3 Bipolar Disorder 114 4 Control 387 5 Eating Disorder (ED) 2 6 Major Depressive Disorder (MDD) 186 7 Obsessive Compulsive Disorder (OCD) 5 8 Post-Traumatic Stress Disorder (PTSD) 0 9 Schizophrenia 220 10 Other 7 11 Tic Disorder 3 12 Undetermined 1 13 Williams Syndrome 2 Table: Numbers of samples in each diagnostic category. DNA extraction: 45-80 mg of cerebellar tissue was pulverized for DNA extractions. The QIAamp DNA mini Kit (Qiagen) method was employed for tissue DNA extraction. The tissue was initially lysed using Tissue Lyser (Qiagen) and extractions were accomplished according to manufacturer's protocol. The DNA was captured in 500uL elution buffer. The concentrations were measured using Thermo Scientific's NanoDrop 1000/NanoDrop ONE. The mean yield was 128.85 uG (+/- 79.48), the mean ratio of 260/280 was 1.87 (+/- 0.105), and the mean ratio of 260/230 was 2.48 (+/-1.75). Genotyping methods: Three types of Illumina Beadarray chips were used: HumanHap650Y, Human1M-Duo, and HumanOmni5M-Quad (San Diego, California). The genotyping was done according to the manufacturer's protocol (Illumina Proprietary, Catalog # WG-901-5003, Part # 15025910 Rev.A, June 2011). Approximately, 400ng DNA was used and each DNA sample was QC tested for 260/280 ratio by nanodrop and DNA band intactness on 2% agarose gel. Briefly, the samples were whole-genome amplified, fragmented, precipitated and resuspended in appropriate hybridization buffer. Denatured samples were hybridized on prepared Bead Array Chips. After hybridization, the Bead Chip oligonucleotides were extended by a single fluorescent labeled base, which was detected by fluorescence imaging with an Illumina Bead Array Reader, iScan. Normalized bead intensity data obtained for each sample were loaded into the Illumina Genome Studio (Illumina, v.2.0.3) with cluster position files provided by Illumina, and fluorescence intensities were converted into SNP genotypes. Microarray: We generated RNA expression data using array technology for psychiatric subjects compared to non-psychiatric subjects as controls. We used tissues from three different brain regions i.e. hippocampus, dorsolateral prefrontal cortex (DLPFC), and dura mater for a large cohort of individuals (total number 552 subjects for hippocampus, 800 for DLPFC and 146 for dura). Total RNA was extracted from ~100 mg of tissue using the RNeasy kit (Qiagen) according to the manufacturer's protocol. RNA quality and quantity were examined using the Bioanalyzer (Agilent, Inc) and NanoDrop (Thermo Scientific, Inc), respectively. Samples with RNA integrity number (RIN) # Diagnosis DLPFC Hippo Dura 1 Anxiety Disorder 1 0 0 2 Autism Spectrum Disorder 14 6 0 3 Bipolar Disorder 90 49 0 4 Control 336 270 75 5 Eating Disorder (ED) 2 1 0 6 Major Depressive Disorder (MDD) 144 87 0 7 Obsessive Compulsive Disorder (OCD) 5 3 0 8 Post-Traumatic Stress Disorder (PTSD) 6 0 0 9 Schizophrenia 192 125 71 10 Other 5 6 0 11 Tic Disorder 3 3 0 12 Undetermined 1 1 0 13 Williams Syndrome 2 1 0 Table: Numbers of samples in each diagnostic category. RNA-Seq of Dorso-lateral prefrontal cortex: All brains were collected and the dorsolateral prefrontal cortical (DLPFC) samples dissected at the HBCC, DIRP, NIMH. Dorsolateral prefrontal cortex (DLPFC) specimens were dissected from right or left hemisphere of frozen coronal slabs. The study was funded by the DIRP, NIMH under contract (#HHSN 271201400099C) with Icahn School of Medicine at Mount Sinai,1106402 One Gustave L. Levy Place, Box 3500, New York NY 10029-6574. RNA extraction, library preparation and sequencing were performed under contract at Icahn School of Medicine. The Common Mind Consortium (CMC) provided project management support. RNA isolation: Total RNA from 468 HBCC samples was isolated from approximately 100 mg homogenized tissue from each sample by TRIzol/chloroform extraction and purification with the Qiagen RNeasy kit (Cat#74106) according to manufacturer's protocol. Samples were processed in randomized batches of 12. The order of extraction for schizophrenia, bipolar, and MDD disorders and control samples was assigned randomly with respect to diagnosis and all other sample characteristics. The mean total RNA yield was 24.2 ug (+/- 9.0). The RNA Integrity Number (RIN) was determined by 4200 Agilent TapeStation System. Samples with RIN DLPFC RNA-Seq quantified expression data are provided for 364 samples. Data were generated, QC'd, processed and quantified as follows: RNA library preparation and sequencing: All samples submitted to the New York Genome Center for RNAseq were prepared for sequencing in randomized batches of 94. The sequencing libraries were prepared using the KAPA Stranded RNAseq Kit with RiboErase (KAPA Biosystems). rRNA was depleted from 1ug of RNA using the KAPA RiboErase protocol that is integrated into the KAPA Stranded RNAseq Kit. The insert size and DNA concentration of the sequencing library was determined on Fragment Analyzer Automated CE System (Advanced Analytical) and Quant-iT PicoGreen (ThermoFisher) respectively. Schizophrenia Bipolar Control 89 65 210 Table: Numbers of samples in each diagnostic category. RNA-Seq of subgenual anterior cingulate cortex (sgACC): All the 200 post-mortem brain samples (61 controls; 39 bipolar disorder; 46 schizophrenia; 54 major depressive disorder) were collected by the HBCC, DIRP, NIMH. RNA Extraction and Quality Assessment: Tissue from sgACC was pulverized and stored at -80°C. Total RNA was extracted from 50-80 mg of the tissue using QIAGEN RNeasy Lipid Tissue Mini Kit (QIAGEN, Cat. # 74804) with DNase treatment (QIAGEN, Cat. # 79254). The RNA Integrity Number (RIN) for each sample was assessed with high-resolution capillary electrophoresis on the Agilent Bioanalyzer 2100 (Agilent Technologies, Palo Alto, California). The concentration of RNA and their 260/280 ratio (2.1+/- 0.032 SD) were determined with NanoDrop (Thermo Scientific). RNA sequencing: Stranded RNA-Seq libraries were constructed after rRNA depletion using Ribo-Zero GOLD (Illumina). RNA sequencing was performed at National Institute of Health Intramural Sequencing Center (NISC). Schizophrenia Bipolar Control MDD 46 39 61 54 Table: Numbers of samples in each diagnostic category. Whole Genome Sequencing: All brains were collected and dissected at the HBCC, DIRP, NIMH. This study generates whole genome sequencing data using sequencing of DNA in the dorsolateral prefrontal cortex (DLPFC), anterior cingulate cortex (ACC) or cerebellum of 443 individuals with schizophrenia, bipolar disorder and major depressive disorder and non-psychiatric controls. The study was funded by the DIRP, NIMH under contract (#HHSN 271201400099C) with Icahn School of Medicine at Mount Sinai,1106402 One Gustave L. Levy Place, Box 3500, New York NY 10029-6574. DNA extraction, library preparation and sequencing were performed under contract at Icahn School of Medicine. The Common Mind Consortium (CMC) provided project management support. All specimens were dissected from right or left hemisphere of frozen coronal slabs. DNA Library Preparation and Sequencing: All samples submitted to the New York Genome Center for WGS were prepared for sequencing in randomized batches of 95. The sequencing libraries were prepared using the Illumina PCR-free DNA sample preparation Kit. The insert size and DNA concentration of the sequencing library was determined on Fragment Analyzer Automated CE System (Advanced Analytical) and Quant-iT PicoGreen (ThermoFisher) respectively. A quantitative PCR assay (KAPA), with primers specific to the adapter sequence, was used to determine the yield and efficiency of the adaptor ligation process. Performed on the Illumina HiSeqX with 30X coverage. Schizophrenia Bipolar Control 115 78 230 Table: Numbers of samples in each diagnostic category. ChIP-Seq: All brains were collected and the dorsolateral prefrontal cortical (DLPFC) samples dissected at the HBCC, DIRP, NIMH. This study generates epigenetic data using sequencing of DNA after chromatin immunoprecipitation (ChIP-Seq) for marks H3K4me3 and H3K27ac in the dorsolateral prefrontal cortex (DLPFC). Dorsolateral prefrontal cortex (DLPFC) specimens were dissected from right or left hemisphere of frozen coronal slabs. The study was funded by the DIRP, NIMH under contract (#HHSN 271201400099C) with Icahn School of Medicine at Mount Sinai,1106402 One Gustave L. Levy Place, Box 3500, New York NY 10029,6574. Chromatin precipitation, library preparation and sequencing were performed under contract at Icahn School of Medicine. The Common Mind Consortium (CMC) provided project management support. Chromatin immunoprecipitation (ChIP) assays for histone marks H3K4me3 and H3K27ac were carried out using Native ChIP. Micrococcal Nuclease (MNase) (Sigma, N3755) treatment was used to digest chromatin into mononucleosomes. The following antibodies were used for chromatin pull-down: anti-H3K4me3 (Cell Signaling, Cat# 9751BC, lot 7) and anti-H3K27ac (Active Motif, Cat# 39133, Lot # 31814008). Histone modification-enriched genomic DNA fragments were recovered using Protein A/G magnetic beads (Thermo Scientific, 88803-88938 or Millipore 16-663), and then washed, eluted, and treated with RNAse A and proteinase K. Final ChIP DNA products were isolated using phenol-chloroform extraction followed by ethanol precipitation. The efficiency of each ChIP assay was validated using Qubit concentration measurement and qPCR for positive (GRIN2B, DARPP32) and negative (HBB) control genomic regions. Only ChIP assays that passed quality control were further processed for library preparation and sequencing; this included ChIP DNA that was not detectable on Qubit but showed a good signal and expected enrichment patterns in qPCR. HISTONE_MARK H3K27ac H3K4me3 Input Bipolar 56 4 7 Control 158 11 24 Schizophrenia 79 11 12 Table: Numbers of individuals in each assay grouped by histone mark or input.Long-Read Whole-Genome Sequencing (WGS) Cohort Description: Brain specimens were obtained from the Human Brain Collection Core (HBCC), part of the NIH NeuroBioBank. Samples were collected under protocols approved by the NIH CNS Institutional Review Board (IRB) (NCT03092687), with informed consent from next-of-kin (NOK). Collection was coordinated through the Offices of the Chief Medical Examiners (MEOs) in Washington, D.C., Northern Virginia, and Central Virginia. Clinical metadata and documentation are publicly available via the NIMH Data Archive (NDA) (Collection #3151) https://nda.nih.gov/edit_collection.html?id=3151 Eligibility Criteria No clinical diagnosis of major neuropsychiatric or neurodegenerative diseaseNo diagnosis of cognitive impairment during life All individuals were confirmed to be neurologically normal at time of deathDemographics Initial cohort size: 155 individuals Ancestry: All individuals self-identified as African or African-admixed Mean age at death: 44.2 years (range: 18–85 years) Sex distribution: 36.4% femaleSample Processing: Frozen frontal cortex tissue was dissected and processed according to the public protocol: https://www.protocols.io/view/processing-human-frontal-cortex-brain-tissue-for-p-kxygxzmmov8j/v2. High-molecular-weight DNA was extracted and libraries were prepared using the Oxford Nanopore Technologies (ONT) LSK-114 kit. Sequencing was performed using ONT PromethION flow cells (R10.4.1 chemistry) Data Processing and Quality Control: Basecalling: Conducted using Guppy v6.38 Read Alignment: Reads were aligned to the GRCh38 reference genome using minimap2 Sample Identity Verification: Sample identity was validated by comparing ONT-derived SNP calls with matched short-read WGS genotypes to ensure concordance and prevent sample swaps Variant Calling and Phasing: Reads were base-called with Guppy v6.38. Reads were aligned to GRCh38 using minimap2. We verified sample identity by cross-checking ONT SNV calls with the existing short-read WGS genotypes, confirming no sample switches. The napu pipeline (https://github.com/nanoporegenomics/napu_wf) produced; haplotype-resolved assemblies, joint small-variant (SNV/indel) calls, and multi-caller structural-variant sets, all reported on GRCh38 and phased where possible. Raw signal data were basecalled to obtain 5-methyl-cytosine (5mC) status; methylation tags were added to the phased BAM files. Genome-wide methylation summaries are provided in BED format.Dataset Filtering and Exclusions: All 155 samples underwent sequencing and SNP-based ancestry inference 8 samples were excluded due to ancestry inconsistent with African or African-admixed background 1 sample was excluded due to insufficient sequencing quality Final Sample Set: 146 high-quality samples from individuals of African or African-admixed ancestry were retained for downstream analyses See PMID: 39764002 for further analysis detailsDiagnosis#SamplesControl155Table: Diagnostic Summary.Note: The data derived from HBCC resources were removed from dbGAP and are now available in the NIMH Data Archive (NDA). They include genotypes, short read whole genome sequencing (WGS), epigenetics (DNA methylation, ChIP-seq for histones), RNA expression (qPCR, microarray, RNA-seq, single nucleus RNA-seq) of various brain regions in cases with schizophrenia, bipolar disorder, major depression, substance use disorders and normative controls. Please access our NDA collection (https://nda.nih.gov/edit_collection.html?id=3151) for further detail.