Single-cell whole transcriptome sequencing data for bone marrow samples from 9 cases with clonal hematopoiesis and 4 control samples. The TARGET-seq+ protocol was used to generate plate-based 3' transcriptome data. For details on cell sorting and the TARGET-seq+ protocol see the methods section of the manuscript. One FASTQ file is provided per cell. Cells are named with their plate and well IDs and the subject ID. Empty wells (no-cell controls) are named "blank". Corresponding genotyping files use the same naming without the "_transcriptome" suffix.
The objective of this project is to analyze genomic aberrations and gene expression changes, such as mutations, deletions, and amplifications, in human cervical cancer cells at the somatic cell level, and to contrast these with clinicopathological information to elucidate the molecular mechanisms and characteristics of lcervical cancer development and progression at the genetic level.
Background: Epigenetic heterogeneity within a tumour can play an important role in tumour evolution and the emergence of resistance to treatment. It is increasingly recognised that the study of DNA methylation (DNAm) patterns along the genome - so-called `epialleles' - offers greater insight into epigenetic dynamics than conventional analyses which examine DNAm marks individually. Results: We have developed a Bayesian model to infer which epialleles are present in multiple regions of the same tumour. We apply our method to reduced representation bisulfite sequencing (RRBS) data from multiple regions of one lung cancer tumour and a matched normal sample. The model borrows information from all tumour regions to leverage greater statistical power. The total number of epialleles, the epiallele DNAm patterns, and a noise hyperparameter are all automatically inferred from the data. Uncertainty as to which epiallele an observed sequencing read originated from is explicitly incorporated by marginalising over the appropriate posterior densities. The degree to which tumour samples are contaminated with normal tissue can be estimated and corrected for. By tracing the distribution of epialleles throughout the tumour we can infer the phylogenetic history of the tumour, identify epialleles that differ between normal and cancer tissue, and define a measure of global epigenetic disorder. Conclusions: Detection and comparison of epialleles within multiple tumour regions enables phylogenetic analyses, identification of differentially expressed epialleles, and provides a measure of epigenetic heterogeneity.
Age-related Macular Degeneration (AMD) is a leading cause of incurable blindness in people over the age of 65. AMD is a late-onset multi-factorial neurodegenerative disease and its pathogenesis involves interaction of genetic and environmental factors. Several chromosomal regions have been associated with AMD susceptibility through linkage analysis (Swaroop et al., 2009). More recent studies provide strong evidence that variants within the CFH gene cluster on chromosome 1 and at/near LOC387715/ARMS2 on chromosome 10 are strongly associated with the disease. Variants at other genes including C2/BF, C3, CFI and APOE4, also contribute to AMD susceptibility. Our primary goals are to identify genetic variants and haplotypes that are associated with AMD. The underlying hypothesis is that DNA variation(s) in multiple genetic susceptibility loci will predispose individuals to AMD pathogenesis, and comparison of DNA of cases and controls should identify these susceptibility variants. Our studies are focused on the genetic analysis of advanced AMD and should provide novel insights into disease diagnosis, progression and pathology. We have assembled a collaborative group of researchers from the University of Michigan, Mayo Clinic, University of Pennsylvania, and the AREDS group including National Eye Institute intramural investigators, who collected clinical data and DNA from a large number of patients affected with AMD and from unaffected controls. The primary source of funding was National Eye Institute. Study 1: To identify genetic variants and haplotypes that are associated with AMD, we submitted and obtained usable genotyping data on 2185 patients and 1155 controls from the Center for Inherited Disease Research (CIDR). Study 2: To identify rare coding variants associated with a large increase in risk of AMD, 10 candidate loci spanning 57 genes were sequenced in 2,335 cases and 789 controls. Probes were designed to capture 96.5% of the coding sequence and 35% of total locus sequence, generating an average 123Mb of on-target sequence per individual at 127x average depth. Substudies: phs000182 AMD-MMAP Cohort Study: A Joint Genome-Wide Asscociation Study phs000246 Fuchs' Corneal Dystrophy GWAS phs000457 MMAP Methylation in AMD phs000685 Age-Related Macular Degeneration Targeted Sequencing Study
Raw FASTQ files for a glioblastoma multiforme organoid study, including exome sequencing and RNA-seq. 29 runs (paired-end), 58 FASTQ.c4gh files (R1/R2). Short-read Illumina; library strategies include exome capture and stranded total RNA-seq (see Experiment records for kit/instrument details). No processed results are included; access is controlled under the selected Policy.
This is the DAC for the study "Pyjacker identifies enhancer hijacking events in acute myeloid leukemia including MNX1 activation via deletion 7q" of Christoph Plass (c.plass@dkfz.de).
The study was conducted under the auspices of the Transdisciplinary Research In Cancer of the Lung (TRICL) Research Team, which is a part of the Genetic Associations and MEchanisms in ONcology (GAME-ON) consortium, and associated with the International Lung Cancer Consortium (ILCCO). Ethics: All participants provided written informed consent. All studies were reviewed and approved by institutional ethics review committees at the involved institutions. Sequencing data are derived from four substudies. The substudies that contributed include Harvard, Liverpool, Toronto, and IARC. The Alpha-Tocopherol, Beta-Carotene Cancer Prevention (ATBC) Study is a randomized primary prevention trial including 29,133 male smokers enrolled in Finland between 1985 and 1993. Participants ranged between ages of 50 to 69 at enrollment and were randomized in a factorial design to take either 50 milligrams of d-alpha tocopheryl acetate (Vitamin E), 20 mg of all-trans-beta-carotene, both or placebo. The study continued to monitor cancer incidence through 2012 and total mortality through December 2013. The CAncer de PUlmon en Asturias Study (CAPUA) is a hospital-based case-control study conducted in Asturias, Spain by the University of Oviedo. Lung cancer cases were recruited in three main hospitals of Asturias, following an identical protocol from 2002 to 2012. Eligible cases were incident cases of histologically confirmed lung cancer between 30 and 85 years of age and residents in the geographical area of each participating hospital. Controls were selected from patients admitted to those hospitals with diagnoses unrelated to the exposures of interest and individually matched by ethnicity, gender, age (± 5 years) and hospital. Epidemiologic data were collected personally through computer-assisted questionnaires by trained interviewers during the first hospital admission. Structured questionnaires collected information on sociodemographic characteristics, recent and prior tobacco use, environmental exposure (air pollution and passive smoking), diet, personal and family history of cancer, and occupational history from each participant. Peripheral blood samples (or mouthwash samples when they refused to donate blood) were collected from all participants. Coding of histology was based on 2001 WHO/IASLC. Genomic DNA was extracted based on standard protocol. The Canadian Screening Study includes the nested case-control samples from 3 screening programs: IELCAP-Toronto: Ever smokers of more than 10 pack-years age 50 and above were eligible for the I-ELCAP screening program since 2003, and a total of 4782 individuals have been enrolled in the Greater Toronto Area. Participants were administered a LDCT scan along with a standard study questionnaire at baseline. Blood samples were systematically collected at baseline since 2006. Participants who had an abnormality in a CT scan were followed up every 1 to 2 years. The screening program was organized by the Princess Margaret Hospital. PanCan: Ever smokers between the ages of 50-75 with no previous history of invasive cancer are eligible to participate in the study. The study was carried out across Canada in Vancouver, Calgary, Hamilton, Toronto, Ottawa, Quebec, Halifax, and St. John's. A total of 2537 smokers have been screened from 2008 to 2011. All study participants completed a detailed questionnaire, spirometry, collection of blood specimens for biomarker measurement and LDCT at baseline. All participants are followed for a minimum of 3 years. On yearly follow up, an updated shorter questionnaire is administered, blood is collected and CT scans are performed. Blood samples are available from all 2537 individuals. BCCA Screening Program: From 1990 to 2007, 4274 smokers above 40 years old who had smoked 20 pack-years or more were enrolled at BCCA. Upon enrollment, subjects completed a questionnaire for their lifestyle and medical history. Baseline spirometry was conducted using a flow-sensitive spirometer in accordance with the American Thoracic Society recommendations. Since 2000, a LDCT was obtained in 2440 individuals. The participants were followed prospectively to determine whether they developed lung cancer. A total of 9759 individuals participated in the CT screening program in Canada from these 3 programs. The samples included in this project is based on a subset of nested lung cancer case-control pairs based on 1:2 ratio. The Carotene and Retinol Efficacy Trial (CARET) was a randomized, double-blind, placebo-controlled trial of the cancer prevention efficacy and safety of a daily combination of 30 mg of beta-carotene and 25,000 IU of retinyl palmitate in 18,314 persons at high risk for lung cancer. CARET began in 1985, and the intervention was halted in January 1996, 21 months ahead of schedule, with the twin conclusions for definitive evidence of no benefit and substantial evidence of a harmful effect of the intervention on both lung cancer incidence and total mortality. CARET continued to follow and collect endpoints on their participants through 2005. Pathology reports and medical records were reviewed to confirm cancer endpoints, and death certificates obtained to capture cause of death. During the active intervention phase of CARET, serum, plasma, whole blood, and lung tissue specimens were collected on participants. These biospecimens make up the CARET Biorepository. For the OncoArray Project, CARET provided DNA extracted from whole blood of lung cancer cases and controls matched on age at baseline (± 4 years), sex, race, baseline smoking status, history of occupational asbestos exposure (asbestos vs heavy smoker), and year of enrollment (2-year intervals). The European Prospective Investigation into Cancer and Nutrition (EPIC) study is a multi-center cohort study involving 521,000 study participants from 10 European countries. The current study involved EPIC participants from 7 countries (Greece, Netherlands, UK, France, Germany, Spain, and Italy), including 1223 incident lung cancer cases and 1249 smoking matched controls. The Kentucky Lung Cancer Research Initiative is a study conducted by the Markey Cancer Center Cancer Center and the University of Kentucky using a population-based, case-control framework to study the extraordinarily high rates of lung cancer in Southeastern, Appalachian Kentucky. Cancer cases were recruited from the Kentucky Cancer Registry at the time of diagnosis and controls were recruited from a random digit dialing process from the same region. Study accrual began in January 5, 2012 and completed on September 5, 2014 and 520 subjects were recruited in a 4:1 ratio of controls: cases from Appalachian Kentucky. Of the 520 subjects recruited, 231 are included in the OncoArray analysis, including all 93 cancer cases, and 123 controls. Newly diagnosed lung cancer cases and controls underwent blood, toenail (for trace element analysis), urine, buffy coat, water, soil, and radon collection, residence GPS mapping, as well as an extensive epidemiologic, occupational, and health history questionnaire (Clinical Trials.gov Identifier: NCT01648166). The Harvard Lung Cancer Study (HLCS) is a case-control study based at Mass General Hospital (MGH) in Boston, Massachusetts from 1992 to 2004. Details of the study were described previously. Briefly, eligible cases included any person over the age of 18 years with a diagnosis of primary lung cancer that was further confirmed by an MGH lung pathologist. Controls were recruited from the friends or spouses of cancer patients or the friends or spouses of other surgery patients in the same hospital. Potential controls were excluded from participation if they had a diagnosis of any cancer (other than non-melanoma skin cancer). Interviewer-administered questionnaires, a modified version of the standardized American Thoracic Society respiratory questionnaire, collected information on demographics, medical history, family history of cancer, smoking history, and a detailed work history, including job titles and tasks. Genome-wide genotype data were first generated using Illumina Human 610-Quad BeadChips and then imputed by MACH against the 1000 Genome Project dataset (http://browser.1000genomes.org/index.html). The Institutional Review Board of MGH and the Human Subjects Committee of the Harvard School of Public Health approved the study. The Israel study (NICCC-LCA) is an ongoing case-control study of newly diagnosed lung cancer cases of any histology and population age/sex/ethnicity-matched "healthy" controls. All participants undergo face-to-face interviews, provide a venous blood sample (separated into DNA, Sera, lymphocytes) after signing an IRB-approved form. Histology reports, FFPE blocks and clinical follow-up are available for most cancer cases. The MD Anderson Cancer Center (MDACC) Study. Lung cancer cases and frequency-matched controls were ascertained from a large ongoing case-control study at the University of Texas MD Anderson Cancer Center (UTMDACC) since 1991. Detailed study description was provided previously (Spitz et al 2007). In brief, cases were newly-diagnosed and histologically confirmed lung cancer patients recruited from UTMDACC. Controls were healthy individuals without a history of cancer (except for nonmelanoma skin cancer) and recruited from the Kelsey-Seybold Clinics, the largest private multispecialty physician group in the Houston metropolitan area. Controls were frequency-matched to cases on age (±5 years), sex, and race/ethnicity. After providing written informed consent, each study participants completed an in-person interview by staff interviewers to collect information on demographics, smoking status, etc. Blood samples were also drawn from all the study participants. This study was approved by institutional review boards of UTMDACC and Kelsey-Seybold Clinics. The Malmö Diet and Cancer Study (MDCS) is a population-based prospective cohort study that recruited men and women aged at 44 to 74 years old of living in Malmö, Sweden between 1991 and 1996. The main goal of the MDCS is to study the impact of diet on cancer incidence and mortality. It consists of a baseline examination including dietary assessment, a self-administered questionnaire, anthropometric measurements and collection of blood samples. A total of 165 incident lung cancer cases and 174 individually smoking-matched controls were available for this analysis. The Multiethnic Cohort (MEC) Study includes 215,251 men and women aged 45-74 years at recruitment, primarily from five ethnic/racial groups - African Americans and Latinos mostly recruited from CA (mainly from Los Angeles County) and Japanese Americans, Native Hawaiians and whites (mostly recruited from HI). The cohort was assembled in 1993-1996 by mailing a self-administered questionnaire to persons identified primarily through driver's license files. The baseline questionnaire obtained information on demographics, anthropometry, smoking history, medical and reproductive histories, family history of cancer, diet and physical activity. Incident cancer cases are identified by regular linkage with the State of California Cancer Registry and the Hawaii Tumor Registry, both members of the SEER Program of the NCI. In 2001-2006, a prospective biorepository was assembled by collecting a pre-diagnostic blood specimen from 67,594 surviving MEC members. At the time of blood collection a short questionnaire was administered that included information on smoking during the previous 15 days. For this study, cases were all lung cancer cases incident to blood draw and diagnosed before December 2012. For each case, a control was selected among unaffected MEC participants who were alive at time of the case's diagnosis and matched on study site, sex, race/ethnicity, age (age at diagnosis for cases; age at blood collection for controls), and date of blood collection. The Mount-Sinai Hospital-Princess Margaret Study (MSH-PMH) was conducted in the greater Toronto area from 2008 to 2013. Lung cancer cases were recruited at the hospitals in the network of the University of Toronto. Controls were selected randomly from individuals registered in the family medicine clinics databases and were frequency matched with cases on age and sex. All subjects were interviewed, and information on lifestyle risk factors, occupational history and medical and family history was collected using a standard questionnaire. Tumors were centrally reviewed by the reference pathologist, a member of the International Association for the Study of Lung Cancer (IASLC) committee, and a second pathologist in the University Health Network. If the reviews conflicted, a consensus was arrived at after discussion. Coding of histology was based on 2001 WHO/IASLC. Genomic DNA was extracted based on standard protocol. The New England Lung Cancer Study (NELCS) is a population-based case-control study of lung cancer among residents of Northern and Central New Hampshire counties and the bordering region of Vermont. Cases with histologically confirmed primary incident lung cancer were identified from 2005 to 2007 using the New Hampshire State Cancer Registry and the Dartmouth-Hitchcock Medical Center (DHMC) Tumor Registry. Control participants were identified using a commercial database and matched to lung cancer cases within 5-year age groups, sex and county. Genomic DNA was isolated from blood or buccal specimens provided by consenting participants. The study complied with requirements of the Dartmouth College's Committee for Protection of Human Subjects. The Nijmegen Lung Cancer Study. The Netherlands patients with lung cancer were identified through the population-based cancer registry of the Netherlands Comprehensive Cancer Organisation in Nijmegen, the Netherlands. Patients who were diagnosed in one of three hospitals (Radboud University Medical Center, Canisius Wilhelmina Hospital in Nijmegen, and Rijnstate Hospital in Arnhem) since 1989 and who were still alive at April 15th, 2008 were recruited for a study on gene-environment interactions in lung cancer. 458 patients gave informed consent and donated a blood sample. This case series was expanded with 94 patients to a total of 552 by linking three other studies to the population-based cancer registry in order to identify new occurrences of lung cancer among the participants of these other studies. All three other studies (i.e., POLYGENE, the Nijmegen Biomedical Study, and the Radboudumc Urology Outpatient Clinic Epidemiology Study) were initiated to study genetic risk factors for disease and participants to these studies gave general informed consent for DNA-related research and linkage with disease registries. Information on histology, stage of disease, and age at diagnoses was obtained through the cancer registry. Lifestyle information was collected through a structured questionnaire and whole blood for DNA isolation was collected by the regional thrombosis services. The cancer-free controls (46% males) were selected from participants of the "Nijmegen Biomedical Study" (NBS), an age- and sex-stratified random sample of the general population of the municipality of Nijmegen, The Netherlands. All participants provided extensive lifestyle information by structured questionnaires and blood samples for DNA isolation, serum and plasma. All controls are of self-reported European descent. The study protocols of the NBS were approved by the Institutional Review Board of the Radboudumc and all study subjects signed a written informed consent form. The Northern Sweden Health and Disease Study (NSHDS) encompasses several prospective cohorts. The current study involves participants from the Västerbotten Intervention Project (VIP), a sub-cohort within NSHDS. VIP is an ongoing prospective cohort and intervention study intended for health promotion of the general population of the Västerbotten County in northern Sweden. VIP was initiated in 1985 and all residents in the Västerbotten County were invited to participate by attending a health check-up at 40, 50 and 60 years of age. Participants were asked to complete a self-administered questionnaire including various demographic factors such as education, smoking habits, physical activity and diet. In addition, height and weight were measured and participants were asked to donate a fasting blood sample for future research. A total of 243 incident lung cancer cases and 266 individually smoking-matched controls were available for this analysis. Norway National Institute of Occupational Health Study. Early-stage NSCLC cases and healthy controls at the time of enrollment were Caucasians of Norwegian origin and were recruited from the same geographical region (Western Norway). The patients were enrolled in the study, whenever practically feasible among patients admitted for lung cancer at the Haukeland University Hospital in Bergen, Norway. The informed written consents covering analysis of molecular and genetic markers was signed by the patients prior to surgery. Only patients with histologically confirmed early-stage NSCLC were included in our study. The subjects included in this project are a subgroup recruited into the project "lung cancer genetics" at NIOH. The controls were recruited from the same geographical region of Western Norway and frequency-matched with cases on cumulative smoking dose (pack-years). Pack-years smoked [( 20 cigarettes per day) x years smoked] were calculated to indicate the cumulative smoking dose. The Cases and controls were interviewed using similar questionnaires and were categorized as never smokers, ex-smokers or current smokers. Never smokers are subjects indicating having smoked less than 100 cigarettes in their life time. Ex-smokers were defined as those having quitted at least 1 year before sampling, and current smokers were those indicating that they were smokers at the time of sampling. The project has been approved by the Regional Committee for Medical and Health Research Ethics in Southern Norway in accordance with the WMA Declaration of Helsinki. The ethical approval covered access to the NSCLC databank. The Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) Study, a randomized trial aimed at evaluating the efficacy of screening in reducing cancer mortality, recruited approximately 155,000 men and women age 55 to 74 years from 1992 to 20014. Screening for lung cancer among participants in the intervention arm included a chest x-ray at baseline followed by either three annual x-rays (for current or former smokers at enrollment) or two annual x-rays (for never smokers); participants in the control arm received routine health care. Screening-arm participants provided data on sociodemographic factors, smoking behavior, anthropometric characteristics, medical history, and family history of cancer, as well as blood samples annually for the first 6 years of the study (baseline T0 and T1 through T5). Lung cancers were ascertained through annual questionnaires mailed to the participants, and positive reports were followed up by abstracting medical records or death certificates. Follow-up in the trial as of July 2009 was 96.7%. Patients were excluded because of missing baseline questionnaire, previous history of any cancer, diagnosis of multiple cancers during follow-up, missing smoking information at baseline, missing consent for utilization of biologic specimens for etiologic studies, or unavailability/insufficient quantity of serum or DNA specimens. The Resource for the Study of Lung Cancer Epidemiology in North Trent (ReSoLuCENT) is an ongoing study conducted in Sheffield from 2006 and due to complete recruitment in 2016. The study recruited pathologically confirmed lung cancer cases diagnosed at age 60 years or younger and family matched controls. Lung cancer cases diagnosed at ages older than 60 years were recruited if they reported a family history of lung cancer. The cases and matched controls were recruited through several major cancer treatment centers, however, the majority were recruited in North Trent. All participants completed a detailed lifestyle questionnaire which included questions about occupational exposures, education, medical history and family history of cancer and lung disease. Participants also donated blood samples for DNA extraction. The ReSoLuCENT study has been funded by the Sheffield Hospitals Charity, Sheffield ECMC and Weston Park Hospital Cancer Charity. First degree relatives were removed from the sample deposited to dbGaP. The Roy Castle Lung Study of Liverpool Lung Project (LLP) is a case-control and cohort study which has recruited over 11,500 individuals since 1996 from the Liverpool region in the UK. Detailed epidemiological and clinical data is collected with associated specimens (i.e. tumor tissue, blood, plasma, sputum, bronchial lavage and oral brushings). The participants have completed a detailed lifestyle questionnaire at recruitment, with repeat questionnaires at intervals; updated data on clinical outcome and hospital events are collected through the Health and Social Care Information Center (including Office of National Statistics mortality data, Cancer Registry and Health Episode Statistics). The project is registered on the UK National Institute for Health Research (NIHR) lung cancer portfolio and has all the required ethical approvals and sponsorship arrangements in place. The lung tumors were reviewed by the reference pathologist. The Seoul Bundang Lung Cancer Study was conducted between 2005 and 2010 to discover genetic and environmental factors related with lung cancer development. Lung cancer cases were recruited at the Seoul National University Hospital in Bundang. Controls were selected randomly from individuals participated in health check-up program and were frequency matched with cases on age and sex. All subjects were interviewed, and information on lifestyle risk factors, occupational history and medical and family history was collected using a standard questionnaire. Tumors were reviewed by the pathologists in the hospital. If the reviews conflicted, a consensus was arrived at after discussion. Coding of histology was based on 2001 WHO/IASLC. Genomic DNA was extracted based on standard protocol. The Shanghai Cohort Study (SCS) consisted of 18,244 men in Shanghai, China, who were 45-64 years old at the time of enrollment during 1986-1989. Approximately 80% of eligible men participated in the study. At the time of recruitment, each cohort subject was interviewed in-person by a trained nurse interviewer using a structured questionnaire that included background information, history of tobacco and alcohol use, current diet, and medical history. At the completion of the interview, the nurse collected a 10 ml blood and a single void urine specimen from the study participant. The buccal cell samples were collected from all surviving cohort members (~15,000) in the 2001-2002 follow-up interviews. The cohort has been followed for the occurrence of cancer and death through routine ascertainment of new cases from the population-based Shanghai Cancer Registry and Shanghai Vital Statistics Units. To maximize the cancer findings and minimize the loss of follow-up, we contacted each surviving cohort member annually. Retired nurses visit the last known address of each living cohort member and record details of the interim health history of the cohort member. As of December 31, 2014, cumulatively 612 (3.4%) original subjects were lost to follow-up, and 574 (3.1%) refused to our continued follow-up interview. A nested case-control study of incident lung cancer cases within the Shanghai Cohort Study was used to examine the association between serum levels of vitamin B6 and other compounds in the one-carbon metabolism pathway and risk of lung cancer. Briefly, 516 lung cancer cases were identified among cohort participants with available serum samples as of 12/31/2006. For each case, we randomly selected one control subject from all cohort members who were free of cancer and alive at the time of cancer diagnosis of the index case. Controls were matched to the index case by age at enrollment (±2 years), date of biospecimen collection (±1 month) and neighborhood of residence at recruitment, and smoking status (current, former and never smokers) as established previously for other studies. For former smokers, cases and controls were further matched by years since quitting smoking (<10 vs ≥10 years). One serum vial per subject was retrieved from biorepository and all serum samples were sent to the laboratory (B-vital) for measurements. DNA samples of 250 lung cancer cases and 250 matched controls were available for the present study. The Singapore Chinese Health Study (SCHS) cohort consisted of 63,257 Chinese men and women in Singapore when they were 45-74 years old at the time of enrollment between April 1993 and December 1998. At recruitment, each study subject was interviewed in person by a trained interviewer using a structured questionnaire that emphasized current diet assessed via a validated, 165-item food frequency questionnaire. The questionnaire also requested information on demographics, lifetime use of tobacco, incense use, current physical activity, usual sleep duration, reproductive history (women only), occupational exposure, medical history, and family history of cancer. Blood or buccal cell, and spot urine samples were collected first from a random 3% sample of cohort participants in April 1994, and extended to all surviving cohort participants starting in January 2000. Overall approximately 60% of eligible cohort participants donated biospecimens. The cohort has been passively followed for death and cancer occurrence through regular record linkage with the population-based Singapore Cancer Registry and the Singapore Registry of Births and Deaths. Migration out of Singapore, especially among housing estate residents, was negligible. As of latest update, only 55 individuals from this cohort were known to be lost to follow-up due to migration and other reason. A nested case-control study of incident lung cancer cases within the Singapore Chinese Health Study was used to examine the association between serum levels of vitamin B6 and other compounds in the one-carbon metabolism pathway and risk of lung cancer. As of 12/31/2011, 422 lung cancer cases were identified among cohort participants with available prediagnostic plasma samples. For each case, one control subject was randomly selected from all eligible cohort members who were alive and free of cancer on the date of cancer diagnosis of the index case. The control subject was individually matched to the index case by gender, dialect group (Hokkien, Cantonese), age at enrollment (±3 years), date of baseline interview (±2 year), date of biospecimen collection (±6 months), and smoking status (current, former, and never smokers). For current smokers, cases and controls were further matched by number of cigarettes per day (<15, ≥15 cigarettes/day). For former smokers, cases and controls were further matched by years since quitting smoking (<10, ≥10 years). One plasma aliquot per subject was retrieved from the biorepository and all plasma samples were sent to the laboratory (B-vital) for measurements, and one aliquot of DNA per subject for the present study. The International Agency for Research on Cancer (IARC) L2 Study. Lung cancer cases and controls were recruited through a multicentric case-control study coordinated by the IARC in Russia, Poland, Serbia, Czech Republic, and Romania from 2005 to 2013. Cases were incident cancer patients collected from general hospitals. Controls were recruited from individuals visiting general hospitals and out-patient clinics for disorders unrelated to lung cancer and/or its associated risk factors, or from the general population. Information on lifestyle risk factors, medical and family history was collected from subjects by interview using a standard questionnaire. All study participants provided written informed consent. The current study included 1,133 lung cancer cases and 1,117 controls genotyped on the Oncoarray. The Washington State University Lung Cancer Study is a hospital case-control study of 511 subjects with newly-diagnosed (within 1 year of diagnosis) lung cancer and 820 race-, sex- and age-matched controls. Lung cancer cases were recruited from lung cancer clinics within the H. Lee Moffitt Cancer Center while controls were recruited from the Lifetime Cancer Screening Center, a H. Lee Moffitt Cancer Center affiliate. None of the controls were diagnosed with any form of cancer at the time of screening. Detailed questionnaire data and oral buccal cells were collected for all subjects. The Total Lung Cancer (TLC) Study is a hospital-based study that included 458 lung cancer patients recruited for Moffitt Cancer Center's Total Cancer Care™ protocol between April 2006 and August 2010. Total Cancer Care™ is a multi-institutional observational study of cancer patients that prospectively collects self-reported demographic and clinical data, medical record information and blood samples for research purposes. All patients used in this cohort were recruited from the Thoracic Oncology Clinic at the Moffitt Cancer Center. The Vanderbilt Lung Cancer Study (BioVU) is a case-control study nested within the Vanderbilt University Medical Center biobank, BioVU. BioVU is a biorepository of DNA extracted from blood drawn from patients seeking routine clinical care at Vanderbilt University Medical Center and linked to de-identified electronic health records for research purposes. Lung cancer cases and controls were identified from BioVU participants in February 2014. Lung cancer cases were identified from the Vanderbilt tumor registry. All specimens undergo pathologic review for determination of morphology. Coding of histology was based on SEER Program Coding Guidelines. Controls were randomly selected from BioVU participants, excluding cancer patients, and were matched to cases on age (± 5 years), sex, and race. Relevant covariates were identified from electronic health records using natural language processing. Genomic DNA was extracted based on a standard protocol.
The NeuroLINCS Center is part of the NIH Common Fund's Library of Integrated Network-based Cellular Signatures (LINCS) program, which aims to characterize how a variety of human cells, tissues and the entire organism respond to perturbations by drugs and other molecular factors. As Part of the LINCS program, the NeuroLINCS study concentrates on human brain cells, which are far less understood than other cells in the body. Our initial focus is to produce diseased motor neurons from patients by utilizing high-quality induced pluripotent stem cell (iPSC) lines from Amyotrophic Lateral Sclerosis (ALS) and Spinal Muscular Atrophy (SMA) patients in addition to unaffected normal healthy controls. Using state-of-the-art OMICS methods (genomics, epigenomics, transcriptomics, and proteomics), we intend to create a wealth of cellular data that is patient-specific in the context of their baseline genetic perturbations and in the presence of other genetic and environmental perturbagens (e.g. endoplasmic reticulum stress). The primary data will be used to build cell signatures that convey the key features that distinguish the state of a cell and determine its behavior. Ultimately, the analysis of these datasets will lead to the identification of a network of unique signatures relevant to each of these motor neuron diseases. The datasets represented in this study are generated from assays interrogating RNA expression (RNA-seq), chromatin accessibility (ATAC-seq) and whole genome sequencing.
Peripheral blood mononuclear cells (PBMC) from 82 T1D cases from the T1DGC study were fractionated by positive selection on magnetic beads into CD4+ T cells, CD8+ T cells and CD19+ B cells. RNA purification, library preparation and sequencing to an average of 50 million reads per sample were performed by the HudsonAlpha Genome Services Laboratory. The resulting dataset includes 82 subjects with data from at least one cell type; sequence data from all 3 cell types is available from 79 subjects. The individuals in this study can be linked to the phs000911 T1DGC ImmunoChip study via subject ids to obtain both genotypic and phenotypic information, including genotypes used for expression quantitative trait locus analysis.
EPIPARK and HBS2 cohorts were analyzed as part of the International Genetics of Parkinson Disease Progression (IGPP) Consortium. EPIPARK is a population-based cohort to study non-motor symptoms in parkinsonism in the city of Lübeck, Germany led by Drs. Meike Kasten and Christine Klein. HBS2 is a recently enrolled subset of participants in the Harvard Biomarkers Study led by Dr. Clemens Scherzer (www.bwhparkinsoncenter.org/biobank) of 89 participants with a diagnosis of Parkinson Disease (PD). Genotyping with Illumina Infinium Multi-Ethnic Genotyping Arrays (MEGA chip) was performed by Dr. Scherzer, Harvard Medical School and Brigham and Women’s Hospital. 179 participants with a diagnosis of PD from EPIPARK and 89 participants with a diagnosis of PD from HBS2 met Q/C criteria were analyzed.
Genome-wide association studies (GWAS) are instrumental in identifying loci with an impact on human traits and disease. Typically, however, most GWAS information is considered redundant as it is based on neighboring single-nucleotide variants (SNVs) in strong linkage disequilibrium (LD). In this context, besides the most significant hit (lead SNV) in every trait- or disease-associated locus, the rest of GWAS hits are often marginally reported, examined, or exploited. To interrogate the value of integrating the information provided by the full set of GWAS hits and fine-mapping haplotypes containing these SNVs, here we focused on a locus repeatedly associated with traits of the cardiac electrical conduction system and arrhythmia (SCN5A-SCN10A). We found that the same SNV can exhibit differing associations depending on haplotype. Thus, these analyses provide corroborative evidence that assuming redundancy among neighboring GWAS hits has the risk of missing critical disease-risk associations.
Rheumatoid arthritis (RA) is a chronic inflammatory disorder of unknown aetiology characterised by synovial inflammation with variable disease severity and drug responsiveness. To investigate the molecular heterogeneity and pathogenesis of RA, we performed a comprehensive clinical and molecular profile of 267 RA patients for up to 18 months to establish a high quality biobank of barcoded samples that included plasma, serum, peripheral blood cells, whole blood RNA, RNA from lymphocyte and monocyte subsets, genomic DNA and urine. We have performed extensive multi-omic immune phenotyping, including genomic, metabolomic, proteomic, transcriptomic and auto-antibody profiling. We anticipate that these detailed clinical and molecular data will serve as a fundamental resource for the development and application of targeted therapies for RA and reveal further insights into disease pathogenesis and therapeutic response.
This DAC is in relation to the DRS samples of peripheral blood of healthy control
This submission is of the sequencing data used in the CRISPR iPSC methods paper. Specifically it is 3 fastq files that each represent a replicate of an experiment to transduce the Toronto KnockOut CRISPR Library - Version 3 (TKOv3) into inferred pluripotent stem cell (iPSC) derived macrophages. The sequencing is of the guide RNAs from the TKOv3 having been extracted from the transduced iPSC derived macrophages.
Molecular classification of cancer has entered clinical routine to inform diagnosis, prognosis and treatment decisions. At the same time, new tumor entities have been identified that cannot be defined histologically. For central nervous systems tumors, the current World Health Organization classification explicitly demands molecular testing, e.g. for 1p/19q- codeletion or IDH mutations, to make an integrated histomolecular diagnosis. However, a plethora of sophisticated technologies is currently needed to assess different genomic and epigenomic alterations and turnaround times are in the range of weeks, which makes standardized and widespread implementation difficult and hinders timely decision making. Here, we explored the potential of a pocket-size nanopore sequencing device for multimodal and rapid molecular diagnostics of cancer. Low-pass whole genome sequencing was used to simultaneously generate copy number (CN) and methylation profiles from native tumor DNA in the same sequencing run. Single nucleotide variants in IDH1, IDH2, TP53, H3F3A and the TERT promoter region were identified using deep amplicon sequencing. Nanopore sequencing yielded ~0.1X genome coverage within six hours and resulting CN and epigenetic profiles correlated well with matched microarray data. Diagnostically relevant alterations, such as 1p/19q codeletion, and focal amplifications could be recapitulated. Using ad hoc random forests, we could perform supervised pan-cancer classification to distinguish gliomas, medulloblastomas and brain metastases of different primary sites. Single nucleotide variants in IDH1, IDH2 and H3F3A were identified using deep amplicon sequencing within minutes of sequencing. Detection of TP53 and TERT promoter mutations shows that sequencing of entire genes and GC-rich regions is feasible. Nanopore sequencing allows same-day detection of structural variants, point mutations and methylation profiling using a single device with negligible capital cost. It outperforms hybridization-based and current sequencing technologies with respect to time-to diagnosis and required laboratory equipment and expertise, aiming to make precision medicine possible for every cancer patient, even in resource restricted settings.
This study provides comprehensive transcriptomic analysis of adolescent and young adult (AYA) patients treated on the Australasian Leukaemia and Lymphoma Group (ALLG) ALL09 clinical trial (ACTRN12618001734257). The ALL09 clinical trial is a Phase II single arm study to assess whether substitution of blinatumomab for conventional multi-agent chemotherapy in phase 2 induction leads to improved minimal residual disease negativity rates when compared to the historical AYA ALL trial cohort. Patients treated on this trial were aged 15-40 years, demonstrated morphological diagnosis of CD19 positive B-lineage acute lymphoblastic leukaemia (B-ALL), and were negative for Philadelphia chromosome-positive disease. Transcriptomic sequencing was performed as a correlative study to determine assess patients’ genomic subtypes and enable comparison of clinical outcome in blinatumomab treated patients.
This study comprises of three different datasets. 1) 57 samples from the 1243 canapps cell line study,2) 91 FFPE normal samples and 3) 87 samples from the SCORT WS2 dataset. The aim is to sequence these 235 samples in order to test the new V2 Colorectal bait design.
The Columbia GENIE study contributes and shares phenotype and genotype data for individuals who were treated with our healthcare facilities and consented to share their data with dbGaP for scientific discovery. Some of these individuals have kidney or neurological problems and some are healthy volunteers from the Washington Height patient community. The purpose of this NOMAS study is to obtain information about physical features of the brain, carotid arteries, and heart. Some of our patients are pediatric patients with cardiac conditions. The study sample consists of four patient cohorts: Northern Manhattan Study (NOMAS), N=1072 Pediatric Cardiac Genomic Consortium (PCGC), n=374 Caribbean Hispanics with Familial and Sporadic Late Onset Alzheimer's disease (Caribbean Hispanics/AD), n=330 Alzheimer's Disease Sequencing Project (ADSP, n=44) Genetics of Chronic Kidney Disease Study, n=1256
Molecular profiling for somatic mutations that predict response to anti-EGFR therapy in colorectal cancer (CRC) has become standard practice. However, abundant tissue from metastatic lesions is not always available from patients with metastatic CRC. Concerns involving genetic heterogeneity between primary and metastatic lesions have called into question the suitability of profiling primary tumors in patients with metastatic disease. Further, the identification of discordant mutations between matched primary and metastatic tumors would be of biological interest for the delineation of biomarkers of tumor progression and metastasis. To explore the degree of genetic heterogeneity between matched primary and metastatic tumors in CRC, we performed whole genome sequencing of four patient "trios" comprised of a primary colon tumor, a liver metastasis, and matched normal (non-cancerous) tissue. Somatic mutations and indels were called in each tumor and compared between primary and metastatic lesions.
Pseudomyxoma peritonei (PMP) is a subtype of mucinous adenocarcinoma mainly restricted to the peritoneal cavity and most commonly originating from the appendix. In this study, we sequenced whole coding genome of nine PMP tumors and paired normal tissues in order to identify commonly mutated genes and signaling pathways affected in PMP.
Immunotherapy directed against private tumor neo-antigens derived from non-synonymous somatic mutations is a promising strategy of personalized cancer immunotherapy. However, feasibility in low mutational load tumor types remains unknown. Comprehensive and deep analysis of circulating and tumor-infiltrating lymphocytes (TILs) for neo-epitope specific CD8 + T cells allowed prompt identification of oligoclonal and polyfunctional such cells from most immunotherapy-naïve patients with advanced epithelial ovarian cancer studied. Neo-epitope recognition was discordant between circulating T cells and TILs, and was more likely to be found among TILs, which displayed higher functionalavidity and unique TCRs with higher predicted affinity than their blood counterparts. Our results imply that identification of neo-epitope specific CD8 + T cells is achievable even in tumors with relatively low number of somatic mutations, and neo-epitope validation in TILs extends opportunities for mutanome-based personalized immunotherapies to such tumors.
The goal of this study is to produce a comprehensive map of the chromatin landscape of human parathyroids. Data from 20 subjects with parathyroid adenoma causing primary hyperparathyroidism were used. For the sequencing platform, Illumina HiSeq 2000 with V3 high-output kits were used. For the data processing pipeline, reads were aligned to hg19 using Bowtie or BWA. For RNA-seq data, transcript levels were estimated per gene to RPKM using Cufflink. For ChIP-seq and ATAC-seq data, enrichment peaks were estimated compared to inputs or background using MACS2 after merging reads from replicates. For DNase-seq, significant peaks compared to background distribution were computed using Hotspot. For Hi-C data, significant interactions were detected using Juicer HiCCUPS and 5 kb resolution for the merged data from replicates. Chromatin states for parathyroids were computed using ChromHMM based on ChIP-seq profiles in parathyroids of H3K4me3, H3K4me1, H3K27ac, H3K36me3, H3K27me3 and H3K9me3 and the emission matrix from 18 states pre-trained in the Roadmap Epigenomics. Gene expression, binding positions of transcription factor, open chromatin regions, 3D-chromatin interaction, and chromatin states are submitted.
Sacituzumab Govitecan (SG), a novel antibody drug conjugate (ADC) incorporating the anti-TROP2 antibody hRS7 conjugated to a topoisomerase-1 inhibitor (SN-38) payload, is the first ADC to be approved for advanced triple negative breast cancer (TNBC). However, mechanisms governing therapeutic resistance to SG are not known. We sought to identify mechanisms of de novo and acquired resistance to SG through unbiased whole-exome sequencing (WES) and RNA sequencing analysis of pre-treatment and multi-site post-progression (autopsy) tumor specimens. We examined three metastatic TNBC cases exhibiting (1) de novo progression, (2) stable disease, and (3) a deep response followed by progression, then mapped the temporal and spatial genomic evolution of acquired resistance in the responding patient. We then conducted additional pre-clinical experiments to validate the observed resistance mechanisms. TROP2 RNA and gene copy number were associated with de novo resistance, as case (1) was found to have absent TROP2 expression in all specimens, case (2) expressed TROP2, while case (3) exhibited both expression and focal genomic amplification of the TACSTD2/TROP2 locus. The genomic phylogeny tree inferred from case (3) post-progression specimens revealed one branch harboring an acquired canonical E418K resistance mutation in TOP1 (encoding topoisomerase 1) and a subsequent sub-clonal TOP1 inactivating frameshift mutation, while a distinct branch exhibited acquisition of a novel T256R missense mutation in TACSTD2 (encoding TROP2). Both the TOP1- and TACSTD2-mutant clones seeded multiple distinct metastatic sites. Through reconstitution experiments in TROP2 negative cells we found that TROP2 T256R is a stable protein with defective cell membrane localization and reduced cell surface binding by RS7 compared to wild-type TROP2. Collectively, these findings underscore TROP2 as a determinant of initial response to SG, and they reveal parallel and mutually exclusive polyclonal molecular mechanisms of acquired resistance involving the direct antibody target and drug payload target in distinct metastatic subclones of a single patient. While further research is needed to extend these novel findings, this study highlights the specificity of SG and illustrates how identifying such mechanisms will inform rational therapeutic strategies to overcome ADC resistance.
Whole exome sequencing was performed on matched tumor and normal DNA of patients with resectable esophageal adenocarcinoma that particpated in the PERFECT trial. The PERFECT trial is an open-label, single-arm, multicenter, phase II feasibility study (NCT03087864) investigating the safety and efficacy of atezolizumab, combined with neoadjuvant chemoradiotherapy and subsequent esophagectomy.
Pulmonary arterial hypertension (PAH) is a rare disorder with a poor prognosis. Deleterious variation within genes encoding components of the transforming growth factor-ß pathway underlie the majority of heritable forms of PAH. Identifying the missing genetic contribution is challenging, even with genes of large effect size, since it likely involves mutations in genes confined to small numbers of PAH cases. In this study, we performed whole genome sequencing, comparing 1038 PAH index cases to 6385 subjects with other rare diseases. Rare variant analysis identified mutations in novel causal genes, namely ATP13A3, AQP1 and SOX17, and provided independent validation of a critical role for GDF2 in PAH. We detected mutations predicted to be disruptive of function in most, but not all, previously reported PAH genes. Taken together these findings provide new insights into the molecular basis of PAH, and support a central role for endothelial dysregulation in disease pathogenesis.