The electronic Medical Records and Genomics (eMERGE) Network is a consortium of ten participating sites (Cincinnati Children's Hospital Medical Center/Boston Children's Hospital, Children's Hospital of Philadelphia, Essentia Institute of Rural Health, Marshfield Clinic Research Foundation and Pennsylvania State University, Geisinger Clinic, Group Health Cooperative/University of Washington, Mayo Clinic, Icahn School of Medicine at Mount Sinai, Northwestern University, Vanderbilt University Medical Center) funded by the NHGRI to investigate the use of electronic medical record (EMR) systems for genomic research. The goal of eMERGE is to conduct genome-wide association studies in approximately 55,000 individuals using EMR-derived phenotypes and DNA from linked Biorepositories. Using electronic phenotyping methods, the consortium used DNA samples from all participating sites to explore the genetic determinants of over forty phenotypes, including Abdominal aortic aneurysm; Ace-Inhibitor/Cough; Attention Deficit Hyperactivity Disorder; Age-related macular disease; Appendicitis; Asthma; Atopic Dermatitis; Autism; Benign Prostatic Hyperplasia; Carotid artery disease as a Quantitative Measure; caMRSA; Cataract; Clostridium difficile colitis; Extreme Obesity; Chronic Kidney Disease; Chronic Kidney Disease and Type 2 Diabetes; Chronic Kidney Disease, Type 2 Diabetes and Hypertension; Colon Polyps; Cardiorespiratory Fitness; Dementia; Diverticulosis; Diabetic retinopathy; Gastroesophageal Reflux Disease; Glaucoma; Height; Heart failure; Hypothyroidism; Lipids; Ocular hypertension; Peripheral Arterial Disease; QRS duration; Red blood cell indices; Remission of Diabetes after ROUX-EN-Y gastric bypass surgery; Resistant hypertension; MACE while on Statins; Type 2 Diabetes; Venous Thromboembolism; White blood cell indices; and Zoster virus infection, as well as using the phenome-wide association study (PheWAS) paradigm to replicate and discover relationships between targeted genotypes with multiple phenotypes. Sites and participants include: Children's Hospital of Pennsylvania (CHOP): The Center for Applied Genomics (CAG) at the Children's Hospital of Philadelphia (CHOP) is a high-throughput, highly automated genotyping and sequencing facility equipped with state-of-the-art genotyping and sequencing platforms. Children who are treated at the Children's Hospital Healthcare Network and their parents may be eligible to take part in a major initiative to collect more than 100,000 blood samples, covering a wide range of pediatric diseases. A large majority of participants consenting to prospective genomic analyses also consent to analysis of their de-identified electronic medical records (EMRs). EMRs are longitudinal, with a mean duration of 6.5 years. Cincinnati Children's Hospital Medical Center/Boston's Children's Hospital (CCHMC/BCH): Cincinnati Children's Hospital Medical Center (CCHMC) and Boston Children's Hospital (BCH) are pediatric institutions dedicated to improving health and welfare of children and to the shared purpose of discovery and practical application of new genomic information to the ordinary care of children. The CCHMC/BCH site has been built on a five-year history of collaboration, particularly in patient electronic record (ERM)-related informatics, the basis of much of eMERGE II. CCHMC and BCH together bring an extraordinary faculty to eMERGE II who are committed to diseases that afflict children, specifically phenotypes that focus upon diseases of children in ways that will leverage the available eMERGE adult GWAS and EMRs to discover meaningful use results. CCHMC/BCH plans to demonstrate real-time execution of phenotypic selection across their two distinct pediatric institutions as a model for ensuring phenotypic standardization and for national scalability. They will also look carefully at parents' responses to results and use of their children's research results and better understand the factors that influence their decisions about learning incidental findings. In addition to patient and parent perceptions CCHMC/BCH will also explore clinician perceptions of pharmacogenetic research results after EMR integration. Geisinger Health System: A research cohort of adult Geisinger Clinic patients was enrolled from community-based primary care clinics of the Geisinger Health System. Patients were eligible for enrollment if they were a primary care patient of a Geisinger Clinic physician and were scheduled for a non-emergent clinic visit. All participants provided written informed consent and HIPAA authorization. Consenting patients agreed to provide blood samples for broad biomedical research use, and permission to access data in their Geisinger electronic medical record for research. The enrollment rate was 90% of patients approached. The demographics of the cohort approximate those of the Geisinger Clinic outpatient population. Research blood samples were collected during an outpatient clinical phlebotomy encounter. Research blood samples are coded and stored in a central biorepository. Samples are linkable to clinical data in a de-identified manner for research via an IRB-approved data broker process. For genomic analysis, DNA is extracted from EDTA-anticoagulated whole blood. Group Health(GH)/University of Washington (UW): GH participants for the PGx project were enrolled in the eMERGE Network through the Northwest Institute of Genetic Medicine (NWIGM) biorepository, and provided the appropriate consent to receive clinically relevant genetic results (N~6300.) Participants were eligible if aged 50 - 65 years old at the time of their enrollment into the NWIGM repository, living, enrolled in GH's integrated group practice, and had completed an online Health Risk Appraisal. The selection algorithm was based on several data sources from the EHR at Group Health: 1. Demographics - participants with self-reported race as Asian or African ancestry were prioritized and selected to enrich for non-European ancestry; 2. Diagnosis and procedure codes - participants were selected if found to have a history of hypertension, atrial fibrillation (AF), or congestive heart failure (CHF). Participants with a history of arrhythmia were added if the entire selection algorithm did not generate 900 individuals. We also enriched for participants with EHR evidence of actionable indications related to PGRNSeq genes. Participants were selected if found to have an ICD9 code for malignant hyperthermia, hypertension, atrial fibrillation, congestive heart failure or long QT syndrome (LQTS); 3. Laboratory values - if participants had any laboratory event of creatine kinase (CK) >1000, and were dispensed statins within 6 months of the event, then they were selected; and 4. Medications - participants were excluded if ever on carbamazepine or had a current regimen of warfarin. Essentia Institute of Rural Health, Marshfield Clinic, Pennsylvania State University (Marshfield): The Marshfield Clinic Personalized Medicine Research Project is a population-based biobank in central Wisconsin with more than 20,000 adult subjects who provided written, informed consent to access their medical records and provided a blood sample from which DNA was extracted and plasma and serum stored. In addition to an average of 30 years of medical history data, a questionnaire about environmental exposures, including a detailed food frequency questionnaire, is available to facilitate gene/environment studies. Mayo Clinic: The Mayo biobank is a disease-specific biobank for vascular diseases including peripheral arterial disease (PAD). PAD patients were identified from individuals referred to the non-invasive vascular laboratory for lower extremity arterial evaluation. Since 1997, laboratory findings have been recorded into an electronic database employing an in-house software package for data archiving and retrieval; this data becomes part of the Mayo EMR. Patients referred to the center with suspected PAD undergo a comprehensive non-invasive evaluation including the ankle-brachial index (ABI) - the ratio of blood pressure measured in the upper arms divided by blood pressure measured at the ankles. Controls subjects are identified from patients referred to the Cardiovascular Health Clinic for stress ECG. The prevalence of PAD in patients with normal exercise capacity who do not have inducible ischemia on the stress ECG , was <1%. Data regarding risk factors for atherosclerosis such as diabetes, dyslipidemia, hypertension, and smoking are ascertained from the EMR. Icahn School of Medicine at Mount Sinai School (Mt. Sinai): The Institute for Personalized Medicine (IPM) Biobank Project is a consented, EMR-linked medical care setting biorepository of the Mount Sinai Medical Center (MSMC) drawing from a population of over 70,000 inpatients and 800,000 outpatient visits annually. MSMC serves diverse local communities of upper Manhattan, including Central Harlem (86% African American), East Harlem (88% Hispanic Latino), and Upper East Side (88% Caucasian/white) with broad health disparities. IPM Biobank populations include 28% African American (AA), 38% Hispanic Latino (HL) predominantly of Caribbean origin, 23% Caucasian/White (CW). IPM Biobank disease burden is reflective of health disparities with broad public health impact: average body mass index of 28.9 and frequencies of hypertension (55%), hypercholesterolemia (32%), diabetes (30%), coronary artery disease (25%), chronic kidney disease (23%), among others. Biobank operations are fully integrated in clinical care processes, including direct recruitment from clinical sites, waiting areas and phlebotomy stations by dedicated Biobank recruiters independent of clinical care providers, prior to or following a clinician standard of care visit. Recruitment currently occurs at a broad spectrum of over 30 clinical care sites. Northwestern University: The NUgene Project is a repository with longitudinal medical information from participating patients at affiliated hospitals and outpatient clinics from the Northwestern University Medical Center. Participants' DNA samples are coupled with data from a self-reported questionnaire and continuously updated data from our Electronic Medical Record (EMR) representing actual clinical care events. Northwestern has a state-of-the art, comprehensive inpatient and outpatient EMR system of over 2 million patients. NUgene has broad access to participant data for all outpatient visits as well as inpatient data via a consolidated data warehouse. NUgene participants consent to distribution and use of their coded DNA samples and data for a broad range of genetic research by third-party investigators. Vanderbilt University: BioVU, Vanderbilt's DNA databank, is an enabling resource for exploration of the relationships among genetic variation, disease susceptibility, and variable drug responses, and represents a key first step in moving the emerging sciences of genomics and pharmacogenomics from research tools to clinical practice. BioVU acquires DNA from discarded blood samples collected from routine patient care. The biobank is linked to de-identified clinical data extracted from Vanderbilt's EMR, which forms the basis for phenotype definitions used in genotype-phenotype correlations.
Important links to apply for individual-level data Genetic Analysis Workshop Instructions to Request Authorized Access Data Use Certification Requirements (DUC) Apply here for controlled access to individual level data Research Use Statement Questions regarding GAW16 should be directed to Vanessa Olmo at vanessa@business-endeavors.com. Problem 2: Description of the Framingham Heart Study In GAW16, we use data drawn from the Framingham Heart Study. The Framingham Heart Study — under the direction of National Heart, Lung, and Blood Institute; NHLBI — began in 1948 with the recruitment of adults from the town of Framingham, Massachusetts. At the time, little was known about the general causes of heart disease and stroke, but the death rates for cardiovascular disease (CVD) had been increasing steadily since the beginning of the 20th century and had become an American epidemic. The Framingham Heart Study is now conducted in collaboration with Boston University. The objective of the Framingham Heart Study was to identify the common factors or characteristics that contribute to CVD by following its development over a long period of time in a large group of participants who had not yet developed overt symptoms of CVD or suffered a heart attack or stroke. Between 1948 and 1953 the researchers recruited 5,209 subjects (2,336 men and 2,873 women) between the ages of 29 and 62 from the town of Framingham, Massachusetts and began the first round of extensive physical examinations and lifestyle interviews that they would later analyze for common patterns related to CVD development. Subjects were recruited from lists of addresses recorded for the town. Two out of every three households were approached for participation in the study. While there was no intention to recruit families for family studies, the plan was to recruit all household members in the ages 30-60 within each house that was selected for study. Hence, many biologically related individuals were recruited, including 1644 spouse pairs. Since 1948, these participants have returned to the study every two years for a detailed medical history, physical examination, and laboratory tests. Now in 2008 at 60 years of follow up, there remain about 500 participants from this cohort. Between 1971 and 1975 the study enrolled a second-generation group — 5,124 of the original participants' children and the spouses of these children — to participate in similar examinations. 2,616 subjects are offspring of the original spouse pairs and 34 are stepchildren. A total of 898 offspring are children of cohort members where only one parent was a study participant and 1,576 are spouses of the offspring. The Offspring Cohort has been followed every four years through 2001 (except between Exams 1 and 2 with an intervening 8 years) using protocols similar to those used for study of the Original Cohort. Between 2002 and 2005 the study enrolled the third generation (Gen3) of the Framingham Heart Study - 4095 offspring of the second generation. None of their spouses were recruited. An additional 103 parents of this third generation, who were not recruited between 1971 and 1975, were also recruited at this time. The latter group is not included in the GAW16 data. With the recruitment of this third generation, the study has increasingly focused on genetic factors associated with the development of cardiovascular disease and its associated risk factors. To date, there is only one examination of this generation of participants. A description of the recruitment of this third generation and comparison with the earlier generations at their initial recruitment is presented in Splansky GL et al., 2007. Further information on the Study can be found at http://www.nhlbi.nih.gov/about/framingham/index.html. Genome-wide Dense SNP Scan in Framingham Heart Study Genetic studies did not begin in the FHS until the 1990s. In the late 1980s and through the 1990s DNA was extracted from blood samples of surviving FHS participants. In 2007, the FHS entered a new phase with the conduct of genotyping for the FHS SHARe (SNP Health Association Resource) project, for which dense SNP genotyping was performed using approximately 550,000 SNPs (GeneChip® Human Mapping 500K Array Set and the 50K Human Gene Focused Panel) in 10,775 samples (some duplicates) from the three generations of subjects (including over 900 pedigrees). Affymetrix conducted all genotyping for the FHS SHARe project, using the 250K Sty, 250K Nsp, and the supplemental 50K platforms. Eighty-nine percent of the DNA samples were collected during the 1990s. To maximize the power of the study, we also extracted DNA from 1133 blood samples, drawn from subjects who had no DNA, to include in the SHARe project. These samples had been sitting in our refrigerators for some time, a few as far back as the 1970s. We refer to these DNA samples as the legacy samples. These samples had a higher failure rate in the genotyping process (40%) than the other eighty-nine percent (3%). Affymetrix invoked its own criteria for a sample to succeed in genotyping. All non-legacy samples must succeed on all three platforms, while legacy samples needed to pass on at least one platform. When a sample failed, additional attempts were made. Samples that repeatedly failed 2-4 times were called failures. Other samples failed due to issues of genotyped sex identification not matching our records or low SNP concordance among SNPs common across arrays or contamination. Eighty-nine percent of the legacy samples for which genotyping results are available passed all three platforms. The genotyping data from the 10,043 samples from 9354 subjects that passed the Affymetrix criteria were additionally checked for gender consistency and consistency with family structure, resulting in genotyping data for 9,274 participants in FHS SHARe. Genotype calls were made with the BRLMM algorithm. The SHARe database is housed at the National Center for Biotechnology Information database of genotypes and phenotypes (NCBI dbGaP) and contains all ~550,000 SNPs. This genome-wide dense SNP scan and a subset of phenotypes from the Framingham Heart Study are the focus of the Genetic Analysis Workshop 16. Further information on the specific variables in the Problem 2 dataset can be found by clicking on the Documents tab at the top of the page. Problem 3: Description of the Simulated Data Set The focus of this simulation is gene discovery in genome-wide association scans (GWAS). The Framingham Heart Study data set (distributed as "Problem 2") is the basis for the FHS* simulated data. The pedigree structures are derived from the data distributed for Problem 2, and we distribute an accompanying triplet file (triplet_sim) containing person ID, father ID, mother ID, to ensure the identical subjects, pedigrees, and singletons are used in the simulated data analysis. Consistent with standard practice, founders and singletons are designated as subjects with both fshare and mshare equal to zero (missing). The simulated data includes a total of 6,479 subjects with both phenotype and genotype data, in 942 pedigrees distributed among 3 generations and 188 singletons. Data inclusion is consistent with the subjects' consent for use by both for-profit and not-for-profit researchers. The genotypes for all Problem 3 replicates are fixed as measured and distributed for Problem 2 for both the genomewide scan and the additional candidate gene SNPs, for a total of approximately 550,000 SNPs (GeneChip® Human Mapping 500K Array Set and the 50K Human Gene Focused Panel). Thus, to analyze the Problem 3 simulated data you also will need to download the Problem 2 genotypes. Note that there are slight discrepancies in counts between Problems 2 and 3 due to a change in consents between the two datasets. Several phenotypes that contribute to coronary heart disease (CHD) were simulated for all individuals with genotypes across three different time points, 10 years apart. All genotyped individuals have complete data; the effects of missing values can be investigated by user-specified missing value patterns. There are 200 longitudinal datasets created, based on the generating model, and each replication is found in a separate dataset. We suggest that if only one replication is to be analyzed, that it be replication 1 to enable more precise comparisons among analytical approaches. The 'shareid' will allow you to merge the simulated phenotype data with the Problem 2 genotype data, and reconstruction of the pedigrees using the distributed 'triplet_sim' file or for larger families' relationships with the triplet distributed with the Problem 2. The simulated data problem is further described in the associated readme file, and a data dictionary is provided defining all the variables. For disclosure of the generating model for these data, please contact Jean MacCluer at jean@sfbrgenetics.org.
The study was conducted under the auspices of the Transdisciplinary Research In Cancer of the Lung (TRICL) Research Team, which is a part of the Genetic Associations and MEchanisms in ONcology (GAME-ON) consortium, and associated with the International Lung Cancer Consortium (ILCCO). Ethics: All participants provided written informed consent. All studies were reviewed and approved by institutional ethics review committees at the involved institutions. Sequencing data are derived from four substudies. The substudies that contributed include Harvard, Liverpool, Toronto, and IARC. The Alpha-Tocopherol, Beta-Carotene Cancer Prevention (ATBC) Study is a randomized primary prevention trial including 29,133 male smokers enrolled in Finland between 1985 and 1993. Participants ranged between ages of 50 to 69 at enrollment and were randomized in a factorial design to take either 50 milligrams of d-alpha tocopheryl acetate (Vitamin E), 20 mg of all-trans-beta-carotene, both or placebo. The study continued to monitor cancer incidence through 2012 and total mortality through December 2013. The CAncer de PUlmon en Asturias Study (CAPUA) is a hospital-based case-control study conducted in Asturias, Spain by the University of Oviedo. Lung cancer cases were recruited in three main hospitals of Asturias, following an identical protocol from 2002 to 2012. Eligible cases were incident cases of histologically confirmed lung cancer between 30 and 85 years of age and residents in the geographical area of each participating hospital. Controls were selected from patients admitted to those hospitals with diagnoses unrelated to the exposures of interest and individually matched by ethnicity, gender, age (± 5 years) and hospital. Epidemiologic data were collected personally through computer-assisted questionnaires by trained interviewers during the first hospital admission. Structured questionnaires collected information on sociodemographic characteristics, recent and prior tobacco use, environmental exposure (air pollution and passive smoking), diet, personal and family history of cancer, and occupational history from each participant. Peripheral blood samples (or mouthwash samples when they refused to donate blood) were collected from all participants. Coding of histology was based on 2001 WHO/IASLC. Genomic DNA was extracted based on standard protocol. The Canadian Screening Study includes the nested case-control samples from 3 screening programs: IELCAP-Toronto: Ever smokers of more than 10 pack-years age 50 and above were eligible for the I-ELCAP screening program since 2003, and a total of 4782 individuals have been enrolled in the Greater Toronto Area. Participants were administered a LDCT scan along with a standard study questionnaire at baseline. Blood samples were systematically collected at baseline since 2006. Participants who had an abnormality in a CT scan were followed up every 1 to 2 years. The screening program was organized by the Princess Margaret Hospital. PanCan: Ever smokers between the ages of 50-75 with no previous history of invasive cancer are eligible to participate in the study. The study was carried out across Canada in Vancouver, Calgary, Hamilton, Toronto, Ottawa, Quebec, Halifax, and St. John's. A total of 2537 smokers have been screened from 2008 to 2011. All study participants completed a detailed questionnaire, spirometry, collection of blood specimens for biomarker measurement and LDCT at baseline. All participants are followed for a minimum of 3 years. On yearly follow up, an updated shorter questionnaire is administered, blood is collected and CT scans are performed. Blood samples are available from all 2537 individuals. BCCA Screening Program: From 1990 to 2007, 4274 smokers above 40 years old who had smoked 20 pack-years or more were enrolled at BCCA. Upon enrollment, subjects completed a questionnaire for their lifestyle and medical history. Baseline spirometry was conducted using a flow-sensitive spirometer in accordance with the American Thoracic Society recommendations. Since 2000, a LDCT was obtained in 2440 individuals. The participants were followed prospectively to determine whether they developed lung cancer. A total of 9759 individuals participated in the CT screening program in Canada from these 3 programs. The samples included in this project is based on a subset of nested lung cancer case-control pairs based on 1:2 ratio. The Carotene and Retinol Efficacy Trial (CARET) was a randomized, double-blind, placebo-controlled trial of the cancer prevention efficacy and safety of a daily combination of 30 mg of beta-carotene and 25,000 IU of retinyl palmitate in 18,314 persons at high risk for lung cancer. CARET began in 1985, and the intervention was halted in January 1996, 21 months ahead of schedule, with the twin conclusions for definitive evidence of no benefit and substantial evidence of a harmful effect of the intervention on both lung cancer incidence and total mortality. CARET continued to follow and collect endpoints on their participants through 2005. Pathology reports and medical records were reviewed to confirm cancer endpoints, and death certificates obtained to capture cause of death. During the active intervention phase of CARET, serum, plasma, whole blood, and lung tissue specimens were collected on participants. These biospecimens make up the CARET Biorepository. For the OncoArray Project, CARET provided DNA extracted from whole blood of lung cancer cases and controls matched on age at baseline (± 4 years), sex, race, baseline smoking status, history of occupational asbestos exposure (asbestos vs heavy smoker), and year of enrollment (2-year intervals). The European Prospective Investigation into Cancer and Nutrition (EPIC) study is a multi-center cohort study involving 521,000 study participants from 10 European countries. The current study involved EPIC participants from 7 countries (Greece, Netherlands, UK, France, Germany, Spain, and Italy), including 1223 incident lung cancer cases and 1249 smoking matched controls. The Kentucky Lung Cancer Research Initiative is a study conducted by the Markey Cancer Center Cancer Center and the University of Kentucky using a population-based, case-control framework to study the extraordinarily high rates of lung cancer in Southeastern, Appalachian Kentucky. Cancer cases were recruited from the Kentucky Cancer Registry at the time of diagnosis and controls were recruited from a random digit dialing process from the same region. Study accrual began in January 5, 2012 and completed on September 5, 2014 and 520 subjects were recruited in a 4:1 ratio of controls: cases from Appalachian Kentucky. Of the 520 subjects recruited, 231 are included in the OncoArray analysis, including all 93 cancer cases, and 123 controls. Newly diagnosed lung cancer cases and controls underwent blood, toenail (for trace element analysis), urine, buffy coat, water, soil, and radon collection, residence GPS mapping, as well as an extensive epidemiologic, occupational, and health history questionnaire (Clinical Trials.gov Identifier: NCT01648166). The Harvard Lung Cancer Study (HLCS) is a case-control study based at Mass General Hospital (MGH) in Boston, Massachusetts from 1992 to 2004. Details of the study were described previously. Briefly, eligible cases included any person over the age of 18 years with a diagnosis of primary lung cancer that was further confirmed by an MGH lung pathologist. Controls were recruited from the friends or spouses of cancer patients or the friends or spouses of other surgery patients in the same hospital. Potential controls were excluded from participation if they had a diagnosis of any cancer (other than non-melanoma skin cancer). Interviewer-administered questionnaires, a modified version of the standardized American Thoracic Society respiratory questionnaire, collected information on demographics, medical history, family history of cancer, smoking history, and a detailed work history, including job titles and tasks. Genome-wide genotype data were first generated using Illumina Human 610-Quad BeadChips and then imputed by MACH against the 1000 Genome Project dataset (http://browser.1000genomes.org/index.html). The Institutional Review Board of MGH and the Human Subjects Committee of the Harvard School of Public Health approved the study. The Israel study (NICCC-LCA) is an ongoing case-control study of newly diagnosed lung cancer cases of any histology and population age/sex/ethnicity-matched "healthy" controls. All participants undergo face-to-face interviews, provide a venous blood sample (separated into DNA, Sera, lymphocytes) after signing an IRB-approved form. Histology reports, FFPE blocks and clinical follow-up are available for most cancer cases. The MD Anderson Cancer Center (MDACC) Study. Lung cancer cases and frequency-matched controls were ascertained from a large ongoing case-control study at the University of Texas MD Anderson Cancer Center (UTMDACC) since 1991. Detailed study description was provided previously (Spitz et al 2007). In brief, cases were newly-diagnosed and histologically confirmed lung cancer patients recruited from UTMDACC. Controls were healthy individuals without a history of cancer (except for nonmelanoma skin cancer) and recruited from the Kelsey-Seybold Clinics, the largest private multispecialty physician group in the Houston metropolitan area. Controls were frequency-matched to cases on age (±5 years), sex, and race/ethnicity. After providing written informed consent, each study participants completed an in-person interview by staff interviewers to collect information on demographics, smoking status, etc. Blood samples were also drawn from all the study participants. This study was approved by institutional review boards of UTMDACC and Kelsey-Seybold Clinics. The Malmö Diet and Cancer Study (MDCS) is a population-based prospective cohort study that recruited men and women aged at 44 to 74 years old of living in Malmö, Sweden between 1991 and 1996. The main goal of the MDCS is to study the impact of diet on cancer incidence and mortality. It consists of a baseline examination including dietary assessment, a self-administered questionnaire, anthropometric measurements and collection of blood samples. A total of 165 incident lung cancer cases and 174 individually smoking-matched controls were available for this analysis. The Multiethnic Cohort (MEC) Study includes 215,251 men and women aged 45-74 years at recruitment, primarily from five ethnic/racial groups - African Americans and Latinos mostly recruited from CA (mainly from Los Angeles County) and Japanese Americans, Native Hawaiians and whites (mostly recruited from HI). The cohort was assembled in 1993-1996 by mailing a self-administered questionnaire to persons identified primarily through driver's license files. The baseline questionnaire obtained information on demographics, anthropometry, smoking history, medical and reproductive histories, family history of cancer, diet and physical activity. Incident cancer cases are identified by regular linkage with the State of California Cancer Registry and the Hawaii Tumor Registry, both members of the SEER Program of the NCI. In 2001-2006, a prospective biorepository was assembled by collecting a pre-diagnostic blood specimen from 67,594 surviving MEC members. At the time of blood collection a short questionnaire was administered that included information on smoking during the previous 15 days. For this study, cases were all lung cancer cases incident to blood draw and diagnosed before December 2012. For each case, a control was selected among unaffected MEC participants who were alive at time of the case's diagnosis and matched on study site, sex, race/ethnicity, age (age at diagnosis for cases; age at blood collection for controls), and date of blood collection. The Mount-Sinai Hospital-Princess Margaret Study (MSH-PMH) was conducted in the greater Toronto area from 2008 to 2013. Lung cancer cases were recruited at the hospitals in the network of the University of Toronto. Controls were selected randomly from individuals registered in the family medicine clinics databases and were frequency matched with cases on age and sex. All subjects were interviewed, and information on lifestyle risk factors, occupational history and medical and family history was collected using a standard questionnaire. Tumors were centrally reviewed by the reference pathologist, a member of the International Association for the Study of Lung Cancer (IASLC) committee, and a second pathologist in the University Health Network. If the reviews conflicted, a consensus was arrived at after discussion. Coding of histology was based on 2001 WHO/IASLC. Genomic DNA was extracted based on standard protocol. The New England Lung Cancer Study (NELCS) is a population-based case-control study of lung cancer among residents of Northern and Central New Hampshire counties and the bordering region of Vermont. Cases with histologically confirmed primary incident lung cancer were identified from 2005 to 2007 using the New Hampshire State Cancer Registry and the Dartmouth-Hitchcock Medical Center (DHMC) Tumor Registry. Control participants were identified using a commercial database and matched to lung cancer cases within 5-year age groups, sex and county. Genomic DNA was isolated from blood or buccal specimens provided by consenting participants. The study complied with requirements of the Dartmouth College's Committee for Protection of Human Subjects. The Nijmegen Lung Cancer Study. The Netherlands patients with lung cancer were identified through the population-based cancer registry of the Netherlands Comprehensive Cancer Organisation in Nijmegen, the Netherlands. Patients who were diagnosed in one of three hospitals (Radboud University Medical Center, Canisius Wilhelmina Hospital in Nijmegen, and Rijnstate Hospital in Arnhem) since 1989 and who were still alive at April 15th, 2008 were recruited for a study on gene-environment interactions in lung cancer. 458 patients gave informed consent and donated a blood sample. This case series was expanded with 94 patients to a total of 552 by linking three other studies to the population-based cancer registry in order to identify new occurrences of lung cancer among the participants of these other studies. All three other studies (i.e., POLYGENE, the Nijmegen Biomedical Study, and the Radboudumc Urology Outpatient Clinic Epidemiology Study) were initiated to study genetic risk factors for disease and participants to these studies gave general informed consent for DNA-related research and linkage with disease registries. Information on histology, stage of disease, and age at diagnoses was obtained through the cancer registry. Lifestyle information was collected through a structured questionnaire and whole blood for DNA isolation was collected by the regional thrombosis services. The cancer-free controls (46% males) were selected from participants of the "Nijmegen Biomedical Study" (NBS), an age- and sex-stratified random sample of the general population of the municipality of Nijmegen, The Netherlands. All participants provided extensive lifestyle information by structured questionnaires and blood samples for DNA isolation, serum and plasma. All controls are of self-reported European descent. The study protocols of the NBS were approved by the Institutional Review Board of the Radboudumc and all study subjects signed a written informed consent form. The Northern Sweden Health and Disease Study (NSHDS) encompasses several prospective cohorts. The current study involves participants from the Västerbotten Intervention Project (VIP), a sub-cohort within NSHDS. VIP is an ongoing prospective cohort and intervention study intended for health promotion of the general population of the Västerbotten County in northern Sweden. VIP was initiated in 1985 and all residents in the Västerbotten County were invited to participate by attending a health check-up at 40, 50 and 60 years of age. Participants were asked to complete a self-administered questionnaire including various demographic factors such as education, smoking habits, physical activity and diet. In addition, height and weight were measured and participants were asked to donate a fasting blood sample for future research. A total of 243 incident lung cancer cases and 266 individually smoking-matched controls were available for this analysis. Norway National Institute of Occupational Health Study. Early-stage NSCLC cases and healthy controls at the time of enrollment were Caucasians of Norwegian origin and were recruited from the same geographical region (Western Norway). The patients were enrolled in the study, whenever practically feasible among patients admitted for lung cancer at the Haukeland University Hospital in Bergen, Norway. The informed written consents covering analysis of molecular and genetic markers was signed by the patients prior to surgery. Only patients with histologically confirmed early-stage NSCLC were included in our study. The subjects included in this project are a subgroup recruited into the project "lung cancer genetics" at NIOH. The controls were recruited from the same geographical region of Western Norway and frequency-matched with cases on cumulative smoking dose (pack-years). Pack-years smoked [( 20 cigarettes per day) x years smoked] were calculated to indicate the cumulative smoking dose. The Cases and controls were interviewed using similar questionnaires and were categorized as never smokers, ex-smokers or current smokers. Never smokers are subjects indicating having smoked less than 100 cigarettes in their life time. Ex-smokers were defined as those having quitted at least 1 year before sampling, and current smokers were those indicating that they were smokers at the time of sampling. The project has been approved by the Regional Committee for Medical and Health Research Ethics in Southern Norway in accordance with the WMA Declaration of Helsinki. The ethical approval covered access to the NSCLC databank. The Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) Study, a randomized trial aimed at evaluating the efficacy of screening in reducing cancer mortality, recruited approximately 155,000 men and women age 55 to 74 years from 1992 to 20014. Screening for lung cancer among participants in the intervention arm included a chest x-ray at baseline followed by either three annual x-rays (for current or former smokers at enrollment) or two annual x-rays (for never smokers); participants in the control arm received routine health care. Screening-arm participants provided data on sociodemographic factors, smoking behavior, anthropometric characteristics, medical history, and family history of cancer, as well as blood samples annually for the first 6 years of the study (baseline T0 and T1 through T5). Lung cancers were ascertained through annual questionnaires mailed to the participants, and positive reports were followed up by abstracting medical records or death certificates. Follow-up in the trial as of July 2009 was 96.7%. Patients were excluded because of missing baseline questionnaire, previous history of any cancer, diagnosis of multiple cancers during follow-up, missing smoking information at baseline, missing consent for utilization of biologic specimens for etiologic studies, or unavailability/insufficient quantity of serum or DNA specimens. The Resource for the Study of Lung Cancer Epidemiology in North Trent (ReSoLuCENT) is an ongoing study conducted in Sheffield from 2006 and due to complete recruitment in 2016. The study recruited pathologically confirmed lung cancer cases diagnosed at age 60 years or younger and family matched controls. Lung cancer cases diagnosed at ages older than 60 years were recruited if they reported a family history of lung cancer. The cases and matched controls were recruited through several major cancer treatment centers, however, the majority were recruited in North Trent. All participants completed a detailed lifestyle questionnaire which included questions about occupational exposures, education, medical history and family history of cancer and lung disease. Participants also donated blood samples for DNA extraction. The ReSoLuCENT study has been funded by the Sheffield Hospitals Charity, Sheffield ECMC and Weston Park Hospital Cancer Charity. First degree relatives were removed from the sample deposited to dbGaP. The Roy Castle Lung Study of Liverpool Lung Project (LLP) is a case-control and cohort study which has recruited over 11,500 individuals since 1996 from the Liverpool region in the UK. Detailed epidemiological and clinical data is collected with associated specimens (i.e. tumor tissue, blood, plasma, sputum, bronchial lavage and oral brushings). The participants have completed a detailed lifestyle questionnaire at recruitment, with repeat questionnaires at intervals; updated data on clinical outcome and hospital events are collected through the Health and Social Care Information Center (including Office of National Statistics mortality data, Cancer Registry and Health Episode Statistics). The project is registered on the UK National Institute for Health Research (NIHR) lung cancer portfolio and has all the required ethical approvals and sponsorship arrangements in place. The lung tumors were reviewed by the reference pathologist. The Seoul Bundang Lung Cancer Study was conducted between 2005 and 2010 to discover genetic and environmental factors related with lung cancer development. Lung cancer cases were recruited at the Seoul National University Hospital in Bundang. Controls were selected randomly from individuals participated in health check-up program and were frequency matched with cases on age and sex. All subjects were interviewed, and information on lifestyle risk factors, occupational history and medical and family history was collected using a standard questionnaire. Tumors were reviewed by the pathologists in the hospital. If the reviews conflicted, a consensus was arrived at after discussion. Coding of histology was based on 2001 WHO/IASLC. Genomic DNA was extracted based on standard protocol. The Shanghai Cohort Study (SCS) consisted of 18,244 men in Shanghai, China, who were 45-64 years old at the time of enrollment during 1986-1989. Approximately 80% of eligible men participated in the study. At the time of recruitment, each cohort subject was interviewed in-person by a trained nurse interviewer using a structured questionnaire that included background information, history of tobacco and alcohol use, current diet, and medical history. At the completion of the interview, the nurse collected a 10 ml blood and a single void urine specimen from the study participant. The buccal cell samples were collected from all surviving cohort members (~15,000) in the 2001-2002 follow-up interviews. The cohort has been followed for the occurrence of cancer and death through routine ascertainment of new cases from the population-based Shanghai Cancer Registry and Shanghai Vital Statistics Units. To maximize the cancer findings and minimize the loss of follow-up, we contacted each surviving cohort member annually. Retired nurses visit the last known address of each living cohort member and record details of the interim health history of the cohort member. As of December 31, 2014, cumulatively 612 (3.4%) original subjects were lost to follow-up, and 574 (3.1%) refused to our continued follow-up interview. A nested case-control study of incident lung cancer cases within the Shanghai Cohort Study was used to examine the association between serum levels of vitamin B6 and other compounds in the one-carbon metabolism pathway and risk of lung cancer. Briefly, 516 lung cancer cases were identified among cohort participants with available serum samples as of 12/31/2006. For each case, we randomly selected one control subject from all cohort members who were free of cancer and alive at the time of cancer diagnosis of the index case. Controls were matched to the index case by age at enrollment (±2 years), date of biospecimen collection (±1 month) and neighborhood of residence at recruitment, and smoking status (current, former and never smokers) as established previously for other studies. For former smokers, cases and controls were further matched by years since quitting smoking (<10 vs ≥10 years). One serum vial per subject was retrieved from biorepository and all serum samples were sent to the laboratory (B-vital) for measurements. DNA samples of 250 lung cancer cases and 250 matched controls were available for the present study. The Singapore Chinese Health Study (SCHS) cohort consisted of 63,257 Chinese men and women in Singapore when they were 45-74 years old at the time of enrollment between April 1993 and December 1998. At recruitment, each study subject was interviewed in person by a trained interviewer using a structured questionnaire that emphasized current diet assessed via a validated, 165-item food frequency questionnaire. The questionnaire also requested information on demographics, lifetime use of tobacco, incense use, current physical activity, usual sleep duration, reproductive history (women only), occupational exposure, medical history, and family history of cancer. Blood or buccal cell, and spot urine samples were collected first from a random 3% sample of cohort participants in April 1994, and extended to all surviving cohort participants starting in January 2000. Overall approximately 60% of eligible cohort participants donated biospecimens. The cohort has been passively followed for death and cancer occurrence through regular record linkage with the population-based Singapore Cancer Registry and the Singapore Registry of Births and Deaths. Migration out of Singapore, especially among housing estate residents, was negligible. As of latest update, only 55 individuals from this cohort were known to be lost to follow-up due to migration and other reason. A nested case-control study of incident lung cancer cases within the Singapore Chinese Health Study was used to examine the association between serum levels of vitamin B6 and other compounds in the one-carbon metabolism pathway and risk of lung cancer. As of 12/31/2011, 422 lung cancer cases were identified among cohort participants with available prediagnostic plasma samples. For each case, one control subject was randomly selected from all eligible cohort members who were alive and free of cancer on the date of cancer diagnosis of the index case. The control subject was individually matched to the index case by gender, dialect group (Hokkien, Cantonese), age at enrollment (±3 years), date of baseline interview (±2 year), date of biospecimen collection (±6 months), and smoking status (current, former, and never smokers). For current smokers, cases and controls were further matched by number of cigarettes per day (<15, ≥15 cigarettes/day). For former smokers, cases and controls were further matched by years since quitting smoking (<10, ≥10 years). One plasma aliquot per subject was retrieved from the biorepository and all plasma samples were sent to the laboratory (B-vital) for measurements, and one aliquot of DNA per subject for the present study. The International Agency for Research on Cancer (IARC) L2 Study. Lung cancer cases and controls were recruited through a multicentric case-control study coordinated by the IARC in Russia, Poland, Serbia, Czech Republic, and Romania from 2005 to 2013. Cases were incident cancer patients collected from general hospitals. Controls were recruited from individuals visiting general hospitals and out-patient clinics for disorders unrelated to lung cancer and/or its associated risk factors, or from the general population. Information on lifestyle risk factors, medical and family history was collected from subjects by interview using a standard questionnaire. All study participants provided written informed consent. The current study included 1,133 lung cancer cases and 1,117 controls genotyped on the Oncoarray. The Washington State University Lung Cancer Study is a hospital case-control study of 511 subjects with newly-diagnosed (within 1 year of diagnosis) lung cancer and 820 race-, sex- and age-matched controls. Lung cancer cases were recruited from lung cancer clinics within the H. Lee Moffitt Cancer Center while controls were recruited from the Lifetime Cancer Screening Center, a H. Lee Moffitt Cancer Center affiliate. None of the controls were diagnosed with any form of cancer at the time of screening. Detailed questionnaire data and oral buccal cells were collected for all subjects. The Total Lung Cancer (TLC) Study is a hospital-based study that included 458 lung cancer patients recruited for Moffitt Cancer Center's Total Cancer Care™ protocol between April 2006 and August 2010. Total Cancer Care™ is a multi-institutional observational study of cancer patients that prospectively collects self-reported demographic and clinical data, medical record information and blood samples for research purposes. All patients used in this cohort were recruited from the Thoracic Oncology Clinic at the Moffitt Cancer Center. The Vanderbilt Lung Cancer Study (BioVU) is a case-control study nested within the Vanderbilt University Medical Center biobank, BioVU. BioVU is a biorepository of DNA extracted from blood drawn from patients seeking routine clinical care at Vanderbilt University Medical Center and linked to de-identified electronic health records for research purposes. Lung cancer cases and controls were identified from BioVU participants in February 2014. Lung cancer cases were identified from the Vanderbilt tumor registry. All specimens undergo pathologic review for determination of morphology. Coding of histology was based on SEER Program Coding Guidelines. Controls were randomly selected from BioVU participants, excluding cancer patients, and were matched to cases on age (± 5 years), sex, and race. Relevant covariates were identified from electronic health records using natural language processing. Genomic DNA was extracted based on a standard protocol.
This postmortem study examines molecular, genetic and epigenetic signatures in the brains of hundreds of subjects with or without mental disorders conducted by the DIRP NIMH Human Brain Collection Core (HBCC). The brain tissues are obtained under protocols approved by the CNS IRB (NCT00001260), with the permission of the next-of-kin (NOK) through the Offices of the Chief Medical Examiners (MEOs) in the District of Columbia, Northern Virginia and Central Virginia. Additional samples were obtained from the University of Maryland Brain and Tissue Bank (contracts NO1-HD-4-3368 and NO1-HD-4-3383) (http://www.medschool.umaryland.edu/btbank/ and the Stanley Medical Research Institute: http://www.stanleyresearch.org/brain-research/). Clinical characterization, neuropathological screening, toxicological analyses, and dissections of various brain regions were performed as previously described (Lipska et al. 2006; PMID: 16997002). All patients met DSM-IV criteria for a lifetime Axis I diagnosis of psychiatric disorders including schizophrenia or schizoaffective disorder, bipolar disorder and major depression. Controls had no history of psychiatric diagnoses or addictions. SNP array: Array-based genotyping was performed on most samples published in this collection. The number of SNPs assayed via Illumina chips varied between 650,000 and 5 Million. Cerebellar tissue was generally used for genotyping studies. # Diagnosis SNP Array 1 Anxiety Disorder 1 2 Autism Spectrum Disorder 13 3 Bipolar Disorder 114 4 Control 387 5 Eating Disorder (ED) 2 6 Major Depressive Disorder (MDD) 186 7 Obsessive Compulsive Disorder (OCD) 5 8 Post-Traumatic Stress Disorder (PTSD) 0 9 Schizophrenia 220 10 Other 7 11 Tic Disorder 3 12 Undetermined 1 13 Williams Syndrome 2 Table: Numbers of samples in each diagnostic category. DNA extraction: 45-80 mg of cerebellar tissue was pulverized for DNA extractions. The QIAamp DNA mini Kit (Qiagen) method was employed for tissue DNA extraction. The tissue was initially lysed using Tissue Lyser (Qiagen) and extractions were accomplished according to manufacturer's protocol. The DNA was captured in 500uL elution buffer. The concentrations were measured using Thermo Scientific's NanoDrop 1000/NanoDrop ONE. The mean yield was 128.85 uG (+/- 79.48), the mean ratio of 260/280 was 1.87 (+/- 0.105), and the mean ratio of 260/230 was 2.48 (+/-1.75). Genotyping methods: Three types of Illumina Beadarray chips were used: HumanHap650Y, Human1M-Duo, and HumanOmni5M-Quad (San Diego, California). The genotyping was done according to the manufacturer's protocol (Illumina Proprietary, Catalog # WG-901-5003, Part # 15025910 Rev.A, June 2011). Approximately, 400ng DNA was used and each DNA sample was QC tested for 260/280 ratio by nanodrop and DNA band intactness on 2% agarose gel. Briefly, the samples were whole-genome amplified, fragmented, precipitated and resuspended in appropriate hybridization buffer. Denatured samples were hybridized on prepared Bead Array Chips. After hybridization, the Bead Chip oligonucleotides were extended by a single fluorescent labeled base, which was detected by fluorescence imaging with an Illumina Bead Array Reader, iScan. Normalized bead intensity data obtained for each sample were loaded into the Illumina Genome Studio (Illumina, v.2.0.3) with cluster position files provided by Illumina, and fluorescence intensities were converted into SNP genotypes. Microarray: We generated RNA expression data using array technology for psychiatric subjects compared to non-psychiatric subjects as controls. We used tissues from three different brain regions i.e. hippocampus, dorsolateral prefrontal cortex (DLPFC), and dura mater for a large cohort of individuals (total number 552 subjects for hippocampus, 800 for DLPFC and 146 for dura). Total RNA was extracted from ~100 mg of tissue using the RNeasy kit (Qiagen) according to the manufacturer's protocol. RNA quality and quantity were examined using the Bioanalyzer (Agilent, Inc) and NanoDrop (Thermo Scientific, Inc), respectively. Samples with RNA integrity number (RIN) # Diagnosis DLPFC Hippo Dura 1 Anxiety Disorder 1 0 0 2 Autism Spectrum Disorder 14 6 0 3 Bipolar Disorder 90 49 0 4 Control 336 270 75 5 Eating Disorder (ED) 2 1 0 6 Major Depressive Disorder (MDD) 144 87 0 7 Obsessive Compulsive Disorder (OCD) 5 3 0 8 Post-Traumatic Stress Disorder (PTSD) 6 0 0 9 Schizophrenia 192 125 71 10 Other 5 6 0 11 Tic Disorder 3 3 0 12 Undetermined 1 1 0 13 Williams Syndrome 2 1 0 Table: Numbers of samples in each diagnostic category. RNA-Seq of Dorso-lateral prefrontal cortex: All brains were collected and the dorsolateral prefrontal cortical (DLPFC) samples dissected at the HBCC, DIRP, NIMH. Dorsolateral prefrontal cortex (DLPFC) specimens were dissected from right or left hemisphere of frozen coronal slabs. The study was funded by the DIRP, NIMH under contract (#HHSN 271201400099C) with Icahn School of Medicine at Mount Sinai,1106402 One Gustave L. Levy Place, Box 3500, New York NY 10029-6574. RNA extraction, library preparation and sequencing were performed under contract at Icahn School of Medicine. The Common Mind Consortium (CMC) provided project management support. RNA isolation: Total RNA from 468 HBCC samples was isolated from approximately 100 mg homogenized tissue from each sample by TRIzol/chloroform extraction and purification with the Qiagen RNeasy kit (Cat#74106) according to manufacturer's protocol. Samples were processed in randomized batches of 12. The order of extraction for schizophrenia, bipolar, and MDD disorders and control samples was assigned randomly with respect to diagnosis and all other sample characteristics. The mean total RNA yield was 24.2 ug (+/- 9.0). The RNA Integrity Number (RIN) was determined by 4200 Agilent TapeStation System. Samples with RIN DLPFC RNA-Seq quantified expression data are provided for 364 samples. Data were generated, QC'd, processed and quantified as follows: RNA library preparation and sequencing: All samples submitted to the New York Genome Center for RNAseq were prepared for sequencing in randomized batches of 94. The sequencing libraries were prepared using the KAPA Stranded RNAseq Kit with RiboErase (KAPA Biosystems). rRNA was depleted from 1ug of RNA using the KAPA RiboErase protocol that is integrated into the KAPA Stranded RNAseq Kit. The insert size and DNA concentration of the sequencing library was determined on Fragment Analyzer Automated CE System (Advanced Analytical) and Quant-iT PicoGreen (ThermoFisher) respectively. Schizophrenia Bipolar Control 89 65 210 Table: Numbers of samples in each diagnostic category. RNA-Seq of subgenual anterior cingulate cortex (sgACC): All the 200 post-mortem brain samples (61 controls; 39 bipolar disorder; 46 schizophrenia; 54 major depressive disorder) were collected by the HBCC, DIRP, NIMH. RNA Extraction and Quality Assessment: Tissue from sgACC was pulverized and stored at -80°C. Total RNA was extracted from 50-80 mg of the tissue using QIAGEN RNeasy Lipid Tissue Mini Kit (QIAGEN, Cat. # 74804) with DNase treatment (QIAGEN, Cat. # 79254). The RNA Integrity Number (RIN) for each sample was assessed with high-resolution capillary electrophoresis on the Agilent Bioanalyzer 2100 (Agilent Technologies, Palo Alto, California). The concentration of RNA and their 260/280 ratio (2.1+/- 0.032 SD) were determined with NanoDrop (Thermo Scientific). RNA sequencing: Stranded RNA-Seq libraries were constructed after rRNA depletion using Ribo-Zero GOLD (Illumina). RNA sequencing was performed at National Institute of Health Intramural Sequencing Center (NISC). Schizophrenia Bipolar Control MDD 46 39 61 54 Table: Numbers of samples in each diagnostic category. Whole Genome Sequencing: All brains were collected and dissected at the HBCC, DIRP, NIMH. This study generates whole genome sequencing data using sequencing of DNA in the dorsolateral prefrontal cortex (DLPFC), anterior cingulate cortex (ACC) or cerebellum of 443 individuals with schizophrenia, bipolar disorder and major depressive disorder and non-psychiatric controls. The study was funded by the DIRP, NIMH under contract (#HHSN 271201400099C) with Icahn School of Medicine at Mount Sinai,1106402 One Gustave L. Levy Place, Box 3500, New York NY 10029-6574. DNA extraction, library preparation and sequencing were performed under contract at Icahn School of Medicine. The Common Mind Consortium (CMC) provided project management support. All specimens were dissected from right or left hemisphere of frozen coronal slabs. DNA Library Preparation and Sequencing: All samples submitted to the New York Genome Center for WGS were prepared for sequencing in randomized batches of 95. The sequencing libraries were prepared using the Illumina PCR-free DNA sample preparation Kit. The insert size and DNA concentration of the sequencing library was determined on Fragment Analyzer Automated CE System (Advanced Analytical) and Quant-iT PicoGreen (ThermoFisher) respectively. A quantitative PCR assay (KAPA), with primers specific to the adapter sequence, was used to determine the yield and efficiency of the adaptor ligation process. Performed on the Illumina HiSeqX with 30X coverage. Schizophrenia Bipolar Control 115 78 230 Table: Numbers of samples in each diagnostic category. ChIP-Seq: All brains were collected and the dorsolateral prefrontal cortical (DLPFC) samples dissected at the HBCC, DIRP, NIMH. This study generates epigenetic data using sequencing of DNA after chromatin immunoprecipitation (ChIP-Seq) for marks H3K4me3 and H3K27ac in the dorsolateral prefrontal cortex (DLPFC). Dorsolateral prefrontal cortex (DLPFC) specimens were dissected from right or left hemisphere of frozen coronal slabs. The study was funded by the DIRP, NIMH under contract (#HHSN 271201400099C) with Icahn School of Medicine at Mount Sinai,1106402 One Gustave L. Levy Place, Box 3500, New York NY 10029,6574. Chromatin precipitation, library preparation and sequencing were performed under contract at Icahn School of Medicine. The Common Mind Consortium (CMC) provided project management support. Chromatin immunoprecipitation (ChIP) assays for histone marks H3K4me3 and H3K27ac were carried out using Native ChIP. Micrococcal Nuclease (MNase) (Sigma, N3755) treatment was used to digest chromatin into mononucleosomes. The following antibodies were used for chromatin pull-down: anti-H3K4me3 (Cell Signaling, Cat# 9751BC, lot 7) and anti-H3K27ac (Active Motif, Cat# 39133, Lot # 31814008). Histone modification-enriched genomic DNA fragments were recovered using Protein A/G magnetic beads (Thermo Scientific, 88803-88938 or Millipore 16-663), and then washed, eluted, and treated with RNAse A and proteinase K. Final ChIP DNA products were isolated using phenol-chloroform extraction followed by ethanol precipitation. The efficiency of each ChIP assay was validated using Qubit concentration measurement and qPCR for positive (GRIN2B, DARPP32) and negative (HBB) control genomic regions. Only ChIP assays that passed quality control were further processed for library preparation and sequencing; this included ChIP DNA that was not detectable on Qubit but showed a good signal and expected enrichment patterns in qPCR. HISTONE_MARK H3K27ac H3K4me3 Input Bipolar 56 4 7 Control 158 11 24 Schizophrenia 79 11 12 Table: Numbers of individuals in each assay grouped by histone mark or input.Long-Read Whole-Genome Sequencing (WGS) Cohort Description: Brain specimens were obtained from the Human Brain Collection Core (HBCC), part of the NIH NeuroBioBank. Samples were collected under protocols approved by the NIH CNS Institutional Review Board (IRB) (NCT03092687), with informed consent from next-of-kin (NOK). Collection was coordinated through the Offices of the Chief Medical Examiners (MEOs) in Washington, D.C., Northern Virginia, and Central Virginia. Clinical metadata and documentation are publicly available via the NIMH Data Archive (NDA) (Collection #3151) https://nda.nih.gov/edit_collection.html?id=3151 Eligibility Criteria No clinical diagnosis of major neuropsychiatric or neurodegenerative diseaseNo diagnosis of cognitive impairment during life All individuals were confirmed to be neurologically normal at time of deathDemographics Initial cohort size: 155 individuals Ancestry: All individuals self-identified as African or African-admixed Mean age at death: 44.2 years (range: 18–85 years) Sex distribution: 36.4% femaleSample Processing: Frozen frontal cortex tissue was dissected and processed according to the public protocol: https://www.protocols.io/view/processing-human-frontal-cortex-brain-tissue-for-p-kxygxzmmov8j/v2. High-molecular-weight DNA was extracted and libraries were prepared using the Oxford Nanopore Technologies (ONT) LSK-114 kit. Sequencing was performed using ONT PromethION flow cells (R10.4.1 chemistry) Data Processing and Quality Control: Basecalling: Conducted using Guppy v6.38 Read Alignment: Reads were aligned to the GRCh38 reference genome using minimap2 Sample Identity Verification: Sample identity was validated by comparing ONT-derived SNP calls with matched short-read WGS genotypes to ensure concordance and prevent sample swaps Variant Calling and Phasing: Reads were base-called with Guppy v6.38. Reads were aligned to GRCh38 using minimap2. We verified sample identity by cross-checking ONT SNV calls with the existing short-read WGS genotypes, confirming no sample switches. The napu pipeline (https://github.com/nanoporegenomics/napu_wf) produced; haplotype-resolved assemblies, joint small-variant (SNV/indel) calls, and multi-caller structural-variant sets, all reported on GRCh38 and phased where possible. Raw signal data were basecalled to obtain 5-methyl-cytosine (5mC) status; methylation tags were added to the phased BAM files. Genome-wide methylation summaries are provided in BED format.Dataset Filtering and Exclusions: All 155 samples underwent sequencing and SNP-based ancestry inference 8 samples were excluded due to ancestry inconsistent with African or African-admixed background 1 sample was excluded due to insufficient sequencing quality Final Sample Set: 146 high-quality samples from individuals of African or African-admixed ancestry were retained for downstream analyses See PMID: 39764002 for further analysis detailsDiagnosis#SamplesControl155Table: Diagnostic Summary.Note: The data derived from HBCC resources were removed from dbGAP and are now available in the NIMH Data Archive (NDA). They include genotypes, short read whole genome sequencing (WGS), epigenetics (DNA methylation, ChIP-seq for histones), RNA expression (qPCR, microarray, RNA-seq, single nucleus RNA-seq) of various brain regions in cases with schizophrenia, bipolar disorder, major depression, substance use disorders and normative controls. Please access our NDA collection (https://nda.nih.gov/edit_collection.html?id=3151) for further detail.