In solid tumor oncology, circulating tumor DNA (ctDNA) is poised to transform care through accurate assessment of minimal residual disease (MRD) and therapeutic response monitoring. To overcome the sparsity of ctDNA fragments in low tumor fraction (TF) settings and increase MRD sensitivity, we previously leveraged genome-wide mutational integration through plasma whole genome sequencing (WGS). We now introduce MRD-EDGE, a composite machine learning-guided WGS ctDNA single nucleotide variant (SNV) and copy number variant (CNV) detection platform designed to increase signal enrichment. MRD-EDGESNV uses deep learning and a ctDNA-specific feature space to increase SNV signal-to-noise enrichment in WGS by 300X compared to previous WGS error suppression. MRD-EDGECNV also reduces the degree of aneuploidy needed for ultrasensitive CNV detection through WGS from 1 Gb to 200 Mb, thereby expanding its applicability to a wider range of solid tumors. We harness the improved performance to identify MRD following surgery in multiple cancer types, track changes in tumor burden in response to neoadjuvant immunotherapy in non-small cell lung cancer (NSCLC) and demonstrate ctDNA shedding in precancerous colorectal adenomas. Finally, the radical signal-to-noise enrichment in MRD-EDGESNV enables de novo mutation calling in melanoma and small cell lung cancer (SCLC) without matched tumor, yielding clinically informative TF monitoring for patients on immune checkpoint inhibition (ICI).
Mutation of DNMT3A, encoding a de novo methyltransferase essential for cytosine methylation, is a common early event in clonal hematopoiesis (CH) and adult acute myeloid leukemia (AML). Spontaneous deamination of methylated cytosines damages DNA, which is repaired by the base excision repair (BER) enzymes MBD4 and TDG. Congenital MBD4-deficiency has been linked to early-onset CH and AML, and is marked by exceedingly high levels of DNA damage and mutation of DNMT3A. Strikingly, wildtype (WT) DNMT3A binds TDG, thereby potentiating its repair activity. Since TDG is the only remaining BER enzyme in MBD4-deficient AML patients capable of repairing methylation damage, we investigated whether mutant DNMT3A negatively affects the repair function of TDG. We found that, whereas WT DNMT3A stimulates TDG function, mutant DNMT3A impairs TDG-mediated repair of DNA damage in vitro. In light of this finding and to extrapolate our observations to the broader AML patient population, we investigate here the genetic profiles and survival outcomes of AML patients with single (SM) versus double mutant (DM) DNMT3A. DM DNMT3A AML patients show a characteristic driver mutation landscape and reduced overall survival when compared to SM DNMT3A AML patients. Importantly, whole-genome sequencing showed a trend for increased DNA damage in primary DM DNMT3A AML samples, especially when DNMT3A mutations are located at the DNMT3A-TDG interaction interface.
The Cancer Immune Monitoring and Analysis Centers (CIMACs) and the Cancer Immunologic Data Commons (CIDC) has engaged in efforts to harmonize Whole-exome (WES) and RNA-sequencing (RNA-seq) data from three different experimental platforms (MD Anderson, NCI MoCha lab, and Broad Institute) and are using a series of developed pipelines to process the early trial samples. To evaluate the consistency of tumor WES and RNA-seq profiling platforms across different centers, the CIMACs-CIDC conducted a systematic harmonization study. DNA and RNA were centrally extracted from fresh frozen (FF) and formalin-fixed paraffin-embedded (FFPE) non-small cell lung carcinoma (NSCLC) tumors and distributed to three centers for WES and RNA-seq profiling
This is a test dataset derived from public data of the 1000 Genomes Project. Its purpose is not to allow for any inference about cohort data or results, but to aid bioinformaticians in the technical development and testing of tools, as well as data consumers in learning how to access information. This dataset consists of 2508 samples from the 1000 Genomes Project (https://www.nature.com/articles/nature15393). Samples' (e.g. NA18534) data can be accessed through the IGSR portal (e.g. https://www.internationalgenome.org/data-portal/sample/NA18534) or their corresponding folder at the 1000 Genomes' FTP site (e.g. http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/data/CHB/NA18534/exome_alignment/). There are several different types of data this dataset encompasses: Variant Calling Format (VCF, or its binary counterparts BCF) files, both joint (e.g. ALL_chr22_20130502_2504Individuals.vcf.gz) and split (HG01775.chrY.vcf.gz); exome sequencing CRAM files (e.g. NA18534.GRCh38DH.exome.cram); whole genome sequencing CRAM/BAM files (e.g. NA19239.cram). Additionally, there are multiple files that were sliced to create shorter files, which allows for a quick download, formated as "{FILE-INFO}__{NUMBER-OF-READS}r__{CHR}.{START-COORDINATE}-{END-COORDINATE}.{FILETYPE}" (e.g. "HG01500.GRCh38DH__90r__3.10000-10500__4.10000-10500.cram"). These files can be downloaded directly through the EGA-download-client PyEGA3 (https://github.com/EGA-archive/ega-download-client).
The data set contains FASTQ files (filetype) of a NEXTSEQ550DX run (instrument). De FASTQ files are from DNA and RNA sample. The Library prep is a NGS TSO500 library prep (Illumina) (technology). A Study to Examine the Clinical Value of Comprehensive Genomic Profiling Performed by Belgian NGS Laboratories: a Belgian Precision Study of the BSMO in Collaboration With the Cancer Centre (BALLETT) This 2-year study involves the consortium of 9 cooperating Belgian NGS laboratories and will enroll 936 metastatic or locally advanced cancer patients coming from 13 different Belgian hospitals and cancer centers. Upon inclusion, all cancer patients will be offered 'comprehensive genomic profiling' (CGP) using Illumina's TSO500 NGS panel. This targeted NGS panel of 523 genes allows for the detection of single nucleotide variants, small indels, copy number variations and fusions, as well as for the determination of the 'tumor mutational burden' (TMB) and the 'microsatellite-instability' status (MSI). Both the wet lab execution of the CGP as well as the biological and clinical classification of the variants will be performed in a fully standardized way among the 9 participating Belgian local NGS laboratories.
This dataset contains single-cell RNA sequencing data from patients with thyroid cancer (n=7), multinodal Goiter (n=3) and healthy individuals (n=5). Mononuclear cells were taken from both the peripheral blood and the bone marrow compartments. We used a pooled single-cell design where multiple individuals were pooled in a single sample for sequencing (NextSeq 500-V2) and later demultiplexed using their genotype data. Associated metadata contains information on the phenotypes per individual, the pooling design and the linkage between the supplied files and sequenced pools. Due to limitations from EGA in uploading single-cell data, the raw fastq files were processed as follows: (i) I1/I2/R1/R2 fast files were concatenated over the different lanes. (ii) Concatenated I1 and I2 files were interleaved, as were the concatenated R1 and R2 files to generate two fastq files per pool containing all the information. To interleave the fastq files, the BBmap tool bbmap/reformat.sh was used, which can also be used to de-interleave the files.
We have collected RNA samples from whole blood of Kenyan children exposed to malaria in the Kilifi region of Kenya. Collections were performed each year from 2015 until 2018. This is a follow-up study to that described in Bediako et al. (in preparation). The SIMS consortium is seeking to identifying the underlying reasons why some children are more susceptible to malaria than others. In this study we hope to track changes in children’s immune systems over time which relate to the number of malaria episodes they experience. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/
This dataset was collected from viable bone marrow cells obtained at diagnosis from nine patients with high hyperdiploid ALL and one normal bone marrow sample. All samples were subjected to low pass single cell whole genome sequencing with the median sequencing coverage of 0.02x. Single nuclei in G0/G1 phase were isolated using a fluorescence-activated cell sorting (FACS) cytometer. DNA libraries were constructed and associated next-generation sequencing was carried out by European Research Institute for the Biology of Ageing (ERIBA), University of Groningen, University Medical Center Groningen, Groningen, The Netherlands. Further details regarding the DNA libraries construction are available by Bos et. al., 2019 (https://link.springer.com/protocol/10.1007/978-1-4939-8931-7_15).
This dataset includes both the anchor scRNA-seq and snRNA-seq datasets used to build the Human Endometrial Cell Atlas (HECA). HECA provides a comprehensive definition of endometrial cell types and states throughout the menstrual cycle in donor with and without endometriosis. It identifies consensus cell types across datasets generated by teams worldwide, as well as previously unreported cell types, all of which are validated and mapped in situ using spatial transcriptomics. Processed data is available from ArrayExpress with accession number E-MTAB-14039. See Marečková, M., Garcia-Alonso, L., Moullet, M. et al. An integrated single-cell reference atlas of the human endometrium. Nat Genet 56, 1925–1937 (2024) https://doi.org/10.1038/s41588-024-01873-w for more information.
This dataset is part of a study that aims to compare in vivo human trophoblast differentiation into EVTs to different in vitro trophoblast organoids using single-cell and single-nuclei RNA sequencing. This specific dataset includes scRNA-seq and snRNA-seq data from trophoblast stem cells (TSCs). Trophoblast stem cell (TSC) lines BTS5 and BTS11 derived by Okae and colleagues were grown as described previously (Okae et al. 2018) together with EVT differentiation media. This study shows that the main regulatory programs mediating EVT invasion in vivo are preserved in in vitro models of EVT differentiation from primary trophoblast organoids and trophoblast stem cells. Data for primary trophoblast organoids is available under E-MTAB-12650.
This is a multi-centre, case-controlled study to develop a dataset containing 1000 MS cases and 1000 matched controls and to associate DNA sequence (allelic) variations with MS phenotypes. Study subjects were enrolled through a prospective effort initiated in 2003. Three MS clinical centres were involved in subject recruitment and biological specimen collection using identical inclusion/exclusion criteria, two in Europe (Vrije Universiteit Medical Center, Amsterdam; and University Hospital Basel) and one in the US (University of California San Francisco). This study recruited subjects of northern-European ancestry with a diagnosis of MS (McDonald et al., 2001), with dissemination in time and space. Patients with Clinically Isolated Syndromes (CIS) were also included if they fulfilled 3 of the 4 Barkhof criteria for dissemination in space as per application of the McDonald criteria (McDonald et al., 2001). While recruitment predominantly included subjects with a relapsing onset of MS, individuals with all clinical subtypes of the disease participated, including clinically isolated syndrome (CIS), relapsing remitting MS (RRMS), secondary progressive MS (SPMS), primary progressive MS (PPMS), and progressive relapsing MS (PRMS). The control group consisted of unrelated individuals, primarily spouses/partners, friends, and other volunteers. Control subjects were of northern-European ancestry and matched as a group, proportionally with cases according to age (±5 years) and gender. A familial history or current diagnosis of MS as well as a relation to another case or control subject were considered exclusionary for this group. Protocols were approved by the Committees on Human Research at all Institutions and informed consent was obtained from all participants prior to participation in the study. Primary Study Objective:To identify DNA sequence variations (genotype) and flanking sequences that are associated with clinical factors (phenotype) which differ between study subjects with and without MS. Secondary Study Objectives: To develop a clinical dataset including quantitative measures of 1000 well-characterized cases with MS, and 1000 ethnically matched controls. To identify other genotype-phenotype associations in MS study subjects such as magnetic resonance imaging (MRI) measures of disease burden and/or severity. To identify or confirm candidate surrogate markers of neurodegeneration using a variety of techniques including biochemical assays, blood transcriptome analysis, plasma proteomics and MRI*. GenotypingGenotyping of the complete dataset was performed at the Illumina facilities using the Sentrix® HumanHap550 BeadChip. *MRI results are not available on dbGaP.
The Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) is a collaborative effort comprised of a coordinating center and scientific researchers from well-characterized cohort and case-control studies conducted in North America and Europe. This international consortium aims to accelerate the discovery of common and rare genetic risk variants for colorectal cancer by conducting large-scale meta-analyses of existing and newly generated genome-wide association study (GWAS) data, replicating and fine-mapping of GWAS discoveries, and investigating how genetic risk variants are modified by environmental risk factors. To expand these efforts, we assembled case-control sets or nested case-control sets from 20 different North American or European studies. Summary descriptions and study participant inclusions/exclusion criteria for each of these studies are detailed below. The Black Women's Health Study (BWHS): Is the largest follow-up study of the health of African-American women (Cozier et al., 2004; Rosenberg et al., 1995) [PMID: 15018884; PMID: 7722208]. The purpose is to identify and evaluate causes and preventives of cancers and other serious illnesses in African-American women. Among the diseases being studied are breast cancer, colorectal cancer, type 2 diabetes, uterine fibroids, systemic lupus erythematosus, and cardiovascular disease. The study began in 1995, when 59,000 black women from all parts of the United States enrolled through postal questionnaires. The women provided demographic and health data on the 1995 baseline questionnaire, including information on weight, height, smoking, drinking, contraceptive use, use of other selected medications, illnesses, reproductive history, physical activity, diet, use of health care, and other factors. The participants are followed through biennial questionnaires to determine the occurrence of cancers and other illnesses and to update information on risk factors. Self-reports of cancer are confirmed through medical records and state cancer registry records. Mouthwash-swish samples, as a source of DNA, were obtained from ~26,000 BWHS participants in 2002-2007. DNA was isolated from the mouthwash-swish samples at the Boston University Molecular Core Genetics Laboratory using the QIAAMP DNA Mini Kit (Qiagen). All incident colorectal cancer cases with a DNA sample were included in the present analysis. Two controls per case, selected from among BWHS participants free of colorectal cancer at end of follow-up, were matched to cases on year of birth (+/- 2 years) and geographical region of residence (Northeast, South, Midwest, and West). A total 209 colorectal cancer cases and 423 controls were sent for genotyping. Campaign Against Cancer and Heart Disease (CLUE II): The Campaign Against Cancer and Heart Disease, is a prospective cohort designed to identify biomarkers and other factors associated with risk of cancer, heart disease, and other conditions (Kakourou et al., 2015) [PMID: 26220152]. 32,894 participants were recruited from May through October 1989 from Washington County, Maryland and surrounding communities. Colorectal cancer cases (n = 297) and matched controls (n = 296) were identified between 1989 and 2000 among participants in the CLUE II cohort of Washington County, Maryland. Colorectal Cancer Study of Austria (CORSA): In the ongoing colorectal cancer study of Austria (CORSA), more than 13,000 Caucasian participants have been recruited within the province-wide screening project "Burgenland Prevention Trial of Colorectal Disease with Immunological Testing" (B-PREDICT) since 2003 (Hofer et al., 2011) [PMID: 21422235]. All inhabitants of the Austrian province Burgenland aged between 40 and 80 years are annually invited to participate in fecal immunochemical testing and haemoccult positive screening participants are invited for colonoscopy. CORSA includes genomic DNA and plasma of colorectal cancer cases, low-risk and high-risk adenomas, and colonoscopy-negative controls. Controls received a complete colonoscopy and were free of colorectal cancer or polyps. CORSA participants have been recruited in the four KRAGES hospitals in Burgenland, Austria, and additionally, at the Medical University of Vienna (Department of Surgery), the Viennese hospitals "Rudolfstiftung" and the "Sozialmedizinisches Zentrum Sud", and at the Medical University of Graz (Department of Internal Medicine). 1403 colorectal cancer and advanced colorectal adenoma cases, and 1404 matched controls were selected for the study. Distribution of factors sex and age (5 year strata) were evenly matched between cases and controls. Cancer Prevention Study II (CPS II): The CPS II Nutrition cohort is a prospective study of cancer incidence and mortality in the United States, established in 1992 and described in detail elsewhere (Calle et al., 2002; Campbell et al., 2014) [PMID: 12015775; PMID: 25472679]. At enrollment, participants completed a mailed self-administered questionnaire including information on demographic, medical, diet, and lifestyle factors. Follow-up questionnaires to update exposure information and to ascertain newly diagnosed cancers were sent biennially starting in 1997. Reported cancers were verified through medical records, state cancer registry linkage, or death certificates. The Emory University Institutional Review Board approves all aspects of the CPS II Nutrition Cohort. A total of 360 cases and 359 controls were selected for this study. Czech Republic Colorectal Cancer Study (Czech Republic CCS): Cases with positive colonoscopy results for malignancy, confirmed by histology as colon or rectal carcinomas, were recruited between September 2003 and May 2012 in several oncological departments in the Czech Republic (Prague, Pilsen, Benesov, Brno, Liberec, Ples, Pribram, Usti and Labem, and Zlin). Two control groups, sampled at the same time of cases recruitment, were included in the study. The first group consisted of hospital-based individuals with a negative colonoscopy result for malignancy or idiopathic bowel diseases. The reasons for the colonoscopy were: i) positive fecal occult blood test, ii) hemorrhoids, iii) abdominal pain of unknown origin, and iv) macroscopic bleeding. The second control group consisted of healthy blood donor volunteers from a blood donor center in Prague. All individuals were subjected to standard examinations to verify the health status for blood donation and were cancer-free at the time of the sampling. Details of CRC cases and controls have been reported previously (Vymetalkova et al., 2014; Naccarati et al., 2016; Vymetalkova et al., 2016) [PMID: 24755277; PMID: 26735576; PMID: 27803053]. All subjects were informed and provided written consent to participate in the study. They approved the use of their biological samples for genetic analyses, according to the Declaration of Helsinki. The design of the study was approved by the Ethics Committee of the Institute of Experimental Medicine, Prague, Czech Republic. All subjects included in the study were Caucasians and comprised 1792 cases and 1764 matched controls. Controls were matched to CRC cases as 1:1 ratio. Matching was done on age and sex. Age was matched on +-5 years, whereas sex was matched exactly. For the cases without matched controls, matching was done only on sex. Early Detection Research Network (EDRN): The aim of the EDRN initiative is to develop and sustain a biorepository for support of translational research (Amin et al., 2010) [PMID: 21031013]. High-quality biospecimens were accrued and annotated with pertinent clinical, epidemiologic, molecular and genomic information. A user-friendly annotation tool and query tool was developed for this purpose. The various components of this annotation tool include: CDEs are developed from the College of American Pathologists (CAP) Cancer Checklists and North American Association of Central Cancer Registries (NAACR) standards. The CDEs provides semantic and syntactic interoperability of the data sets by describing them in the form of metadata or data descriptor. A total of 352 colorectal case samples and 399 controls were selected for this study. Controls were matched to CRC cases based on age and sex. The EPICOLON Consortium (EPICOLON): The EPICOLON Consortium comprises a prospective, multicentre and population-based epidemiology survey of the incidence and features of CRC in the Spanish population (Fernandez-Rozadilla et al., 2013) [PMID: 23350875]. Cases were selected as patients with de novo histologically confirmed diagnosis of colorectal adenocarcinoma. Patients with familial adenomatous polyposis, Lynch syndrome or inflammatory bowel disease-related CRC, and cases where patients or family refused to participate in the study were excluded. Hospital-based controls were recruited through the blood collection unit of each hospital, together with cases. All of the controls were confirmed to have no history of cancer or other neoplasm and no reported family history of CRC. Controls were randomly selected and matched with cases for hospital, sex and age (+- 5 years). A total of 370 cases and 370 controls were selected for genotyping. Hawaii Adenoma Study: For this adenoma study, two flexible-sigmoidoscopy screening clinics were first used to recruit participants on Oahu, Hawaii. Adenoma cases were identified either from the baseline examination at the Hawaii site of the Prostate Lung Colorectal and Ovarian cancer screening trial during 1996-2000 or at the Kaiser Permanente Hawaii's Gastroenterology Screening Clinic during 1995-2007. In addition, starting in 2002 and up to 2007, we also approached for recruitment all eligible patients who underwent a colonoscopy in the Kaiser Permanente Hawaii Gastroenterology Department. Cases were patients with histologically confirmed first-time adenoma(s) of the colorectum and were of Japanese, Caucasian or Hawaiian race/ethnicity. Controls were selected among patients with a normal colorectum and were individually matched to the cases on age at exam, sex, race/ethnicity, screening date (+-3 months) and clinic and type of examination (colonoscopy or flexible sigmoidoscopy). We recruited 1016 adenoma cases (67.8% of all eligible) and 1355 controls (69.2% of all eligible); 889 cases and 1169 controls agreed to give a blood and 29 cases and 34 controls, a mouthwash sample. A total of 989 cases and 1185 controls were genotyped for this study. Columbus-area HNPCC Study (HNPCC, OSUMC): Patients with colorectal adenocarcinoma diagnosed at six participating hospitals were eligible for this study, regardless of age at diagnosis or family history of cancer. Patients with a clinical diagnosis of familial adenomatous polyposis were not eligible for this study. These six hospitals perform the vast majority of all operations for CRC in the Columbus metropolitan area (population 1.7 million). The institutional review board at all participating hospitals approved the research protocol and consent form in accordance with assurances filed with and approved by the United States Department of Health and Human Services. Briefly, during the period of January 1999 through August 2004, 1,566 eligible patients with CRC were accrued to the study (Hampel et al., 2008) [PMID 18809606]. A total of 1472 colorectal cancer samples had enough blood DNA remaining to be sent for genotyping. Control samples were provided by the Ohio State University Medical Center%#39;s (OSUMC) Human Genetics Sample Bank. The Columbus Area Controls Sample Bank is a collection of control samples for use in human genetics research that includes both donors' anonymized biological specimens and linked phenotypic data. The data and samples are collected under the protocol "Collection and Storage of Controls for Genetics Research Studies", which is approved by the Biomedical Sciences Institutional Review Board at OSUMC. Recruitment takes place in OSUMC primary care and internal medicine clinics. If individuals agree to participate, they provide written informed consent, complete a questionnaire that includes demographic, medical and family history information, and donate a blood sample. 4-7 ml of blood is drawn into each of 3 ACD Solution A tubes and is used for genomic DNA extraction and the establishment of an EBV-transformed lymphoblastoid cell culture, cell pellet in Trizol, and plasma. Controls were matched to CRC cases as 1:1. Matching was done on age at reference time (age_ref), race, and sex. Age_ref was matched on +-5 years. Sex and race were matched exactly. For the cases without matched controls, matching was done only on sex and race with 1:1 ratio. Since controls are fewer than cases, one control is matched on 2 cases at most. Health Professionals Follow-up Study (HPFS): A parallel prospective study to the NHS (Nurses' Health Study). The HPFS cohort comprised 51,529 men aged 40-75 who, in 1986, responded to a mailed questionnaire (Rimm et al., 1990) [PMID: 2090285]. Participants provided information on health related exposures, including current and past smoking history, age, weight, height, diet, physical activity, aspirin use, and family history of colorectal cancer. Colorectal cancer and other outcomes were reported by participants or next-of-kin and were followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical record review. Information was abstracted on histology and primary location. Incident cases were defined as those occurring after the subject provided the blood sample. Prevalent cases were defined as those occurring after enrollment in the study but before the subject provided the blood sample. Follow-up evaluation has been excellent, with 94% of the men responding to date. Colorectal cancer cases were ascertained through January 1, 2008. In 1993-1995, 18,825 men in the HPFS mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 13,956 men in the HPFS who had not provided a blood sample previously mailed in a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1986, but before the subject provided either a blood or buccal sample. After excluding participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were previously constructed. In addition to colorectal cancer cases and controls, a set of adenoma cases and matched controls with available DNA from buffy coat were selected for genotyping. Over the follow-up period, data were collected on endoscopic screening practices and, if individuals had been diagnosed with a polyp, the polyps were confirmed to be adenomatous by medical record review. Adenoma cases were ascertained through January 1, 2008. A separate case-control set was constructed of participants diagnosed with advanced adenoma matched to control participants who underwent a lower endoscopy in the same time period and did not have an adenoma. Advanced adenoma was defined as an adenoma 1 cm or larger in diameter and/or with tubulovillous, villous, or highgrade dysplasia/carcinoma-in-situ histology. Matching criteria included year of birth (within 1 year) and month/ year of blood sampling (within 6 months), the reason for their lower endoscopy (screening, family history, or symptoms), and the time period of any prior endoscopy (within 2 years). Controls matched to cases with a distal adenoma either had a negative sigmoidoscopy or colonoscopy examination, and controls matched to cases with proximal adenoma all had a negative colonoscopy. In total, 159 advanced adenoma cases and 109 controls were selected for genotyping. Leeds Colorectal Cancer Study (LCCS): Following local ethical approval, colorectal cancer cases were recruited from 1997 until 2012 in Leeds, UK through surgical clinics. Initially, funding was provided by the UK Ministry of Agriculture, Farming and Fisheries (subsequently the Food Standards Agency) and Imperial Cancer Research Fund (subsequently Cancer Research UK). Recruitment also occurred similarly in Dundee, Perth and York between the periods of 1997 and 2001 using the same protocol and the data and samples were combined. Pathologically confirmed cases were consented at outpatient clinics, providing information on known and postulated risk factors for colorectal cancer (diet, lifestyle and family history) as well as providing a blood sample for DNA. Exclusion criteria included pre-existing diverticular disease and an inability to complete the questionnaire. The General Practitioners of cases (all UK residents have a nominated General Practitioner to whom to refer initial medical queries) and these GPs were asked to send letters to other persons on their patient list of the same gender and born within 5 years of the case. Subsequently to enhance the number of controls, we systematically invited patients from selected GP practices. Diet was assessed in cases and controls using an extensive dietary and lifestyle questionnaire modified by that produced by the European Prospective Investigation in Cancer (EPIC). The frequency that each specific food items were eaten was recorded and we also obtained average fruit and vegetable consumption as a cross-check. In total, 1591 cases and 739 controls provided a DNA sample. The North Carolina Colon Cancer Studies (NCCCS I/II): The North Carolina Colon Cancer Studies (NCCCS I- colon and NCCCS II-rectal) were population-based case-control studies conducted in 33 counties of North Carolina. Cases were identified using the rapid case ascertainment system of the North Carolina Central Cancer Registry. Patients with a first diagnosis of histologically confirmed invasive adenocarcinoma of the colon (cecum through sigmoid colon) between October 1996 and September 2000 were classified as potential cases in the NCCCS I. The NCCCS II included patients with a first diagnosis of histologically confirmed invasive adenocarcinoma of the sigmoid colon, rectosigmoid, or rectum (hereafter collectively referred to as rectal cancer) between May 2001 and September 2006. Additional eligibility requirements were: aged 40-80 years, residence in one of the 33 counties, ability to give informed consent and complete an interview, had a driver's license or identification card issued by the North Carolina Department of Motor Vehicles (if under the age of 65), and had no objections from the primary physician in regards to contacting the individual. Controls, identified and sampled during the respective study dates, were selected from two sources. Potential controls under the age of 65 were identified using the North Carolina Department of Motor Vehicles records. For those 65 years and older, records from the Center for Medicare and Medicaid Services were used. Controls were matched to cases using randomized recruitment strategies. Recruitment probabilities were done using strata of 5-year age, sex, and race groups. Dietary information was collected using a modified version of the semiquantitative food frequency questionnaire developed at the National Cancer Institute. In addition, participants were asked about vitamin and mineral supplementation, special diets, restaurant eating, sodium use, and fats used in cooking. In NCCCS I, 515 colorectal cases and 687 matched controls were sent for genotyping. In NCCCS II, 796 colorectal cases and 823 controls were sent from the NCCCS II for genotyping. Controls were matched to CRC cases as 1:1 ratio. Matching was done on age, race, and sex. Age was matched on +-5 years. Race and sex was matched exactly. For the cases without matched controls, matching was done only on sex and race. Nurses Health Study (NHS): The NHS cohort began in 1976 when 121,700 married female registered nurses age 30-55 years returned the initial questionnaire that ascertained a variety of important health-related exposures (Belanger et al., 1978) [PMID: 248266]. Since 1976, follow-up questionnaires have been mailed every 2 years. Colorectal cancer and other outcomes were reported by participants or next-of-kin and followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical-record review. Information was abstracted on histology and primary location. The rate of follow-up evaluation has been high: as a proportion of the total possible follow-up time, follow-up evaluation has been more than 92%. Colorectal cancer cases were ascertained through June 1, 2008. In 1989 -1990, 32,826 women in NHS I mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 29,684 women in NHS I who did not previously provide a blood sample mailed a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1976 but before the subject provided either a blood or buccal sample. After excluding participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were previously constructed from which DNA was isolated from either buffy coat or buccal cells for genotyping. In addition to colorectal cancer cases and controls, a set of advanced adenoma cases and matched controls with available DNA from buffy coat were selected for genotyping. Over the follow-up period, data were collected on endoscopic screening practices and, if individuals had been diagnosed with a polyp, the polyps were confirmed to be adenomatous by medical record review. Adenoma cases were ascertained through June 1, 2011. A separate case-control set was constructed of participants diagnosed with advanced adenoma matched to control participants who underwent a lower endoscopy in the same time period and did not have an adenoma. Advanced adenoma was defined as an adenoma more than 1 cm in diameter and/or with tubulovillous, villous, or high-grade dysplasia/carcinoma-in-situ histology. Matching criteria included year of birth (within 1 year) and month/year of blood sampling (within 6 months), the reason for their lower endoscopy (screening, family history, or symptoms), and the time period of any prior endoscopy (within 2 years). Controls matched to cases with a distal adenoma either had a negative sigmoidoscopy or colonoscopy examination, and controls matched to cases with proximal adenoma all had a negative colonoscopy. A total of 272 cases and 236 matched controls were sent to CIDR for the advanced adenoma case-control set. Northern Swedish Health and Disease Study (NSHDS): Comprises over 110,000 participants, including approximately one third with repeated sampling occasions, from three population-based cohorts (Dahlin et al., 2010; Myte et al., 2016) [PMID: 20197478; PMID: 27367522]. The largest is the ongoing Vasterbotten Intervention Programme, in which all residents of Vasterbotten County are invited to a health examination upon turning 30 (some years), 40, 50 and 60 years of age. Extensive measured and self-reported health and lifestyle data, as well as blood samples for central biobanking in Umea, Sweden, are collected at the health exam. Leucocyte DNA samples for 1:1-matched CRC case-control sets from the NSHDS, of which 878 samples are included in this study, have been selected for genotyping. This is in addition to 354 samples from the NSHDS previously analyzed as part of the multicenter EPIC cohort. Cancer-specific and overall survival data are available for all patients. For at least 425 patients, archival tumor tissue has been analyzed for the BRAF V600E mutation and by sequencing codon 12 and 13 for KRAS mutations, as well as for MSI screening status by immunohistochemistry and for an eight-gene CIMP panel using quantitative real-time PCR (MethyLight). Ohio Colorectal Cancer Prevention Initiative (OCCPI, OSUMC): OCCPI (ClinicalTrials.gov identifier: NCT01850654) is a population-based study of colorectal cancer patients diagnosed in one of 51 hospitals throughout the state of Ohio from January 1, 2013 through December 31, 2016. The OCCPI was created to decrease CRC incidence in Ohio by identifying patients with hereditary predisposition (statewide universal tumor screening for newly diagnosed CRC patients), increase colonoscopy compliance for first-degree relatives of CRC patients, and encourage future research through the creation of a biorepository. The 51 Ohio hospitals participating in the OCCPI were selected to represent a cross-section of clinical centers in the state based on high reported volume of CRC patients, affiliation with a high volume hospital, or interest in participation. Institutional Review Board (IRB) approval was obtained by the individual hospitals, Community Oncology Programs, or by ceding review to the OSU IRB. Written informed consent was obtained. A total of 2139 colorectal cases were genotyped. Patients were considered eligible for this study if they were age 18 or older at the time of enrollment, if they had a surgical resection (or biopsy if unresectable) in the state of Ohio demonstrating an adenocarcinoma of the colorectum from 1/1/13 - 12/31/16. Matched control samples were selected from the Ohio State University Medical Center's (OSUMC) Human Genetics Sample Bank in an identical way to the selection for the Columbus-area HNPCC Study (please refer to the description for the Columbus-area HNPCC Study). Prostate, Lung, Colorectal and Ovarian Cancer Screening Trail (PLCO): PLCO enrolled 154,934 participants (men and women, aged between 55 and 74 years) at ten centers into a large, randomized, two-arm trial to determine the effectiveness of screening to reduce cancer mortality. Sequential blood samples were collected from participants assigned to the screening arm. Participation was 93% at the baseline blood draw. In the observational (control) arm, buccal cells were collected via mail using the "swish-and-spit" protocol and participation rate was 65%. Details of this study have been previously described (Huang et al., 2016) [PMID: 27673363] and are available online (http://dcp.cancer.gov/plco). For this study 1651 advanced adenoma cases and 1392 controls were selected for genotyping. Selenium and Vitamin E Prevention Trial (SELECT): The Selenium and Vitamin E Cancer Prevention Trial (SELECT) was a double-blind, placebo controlled clinical trial which explored using selenium and vitamin E alone and in combination to prevent prostate cancer in healthy men (Lippman et al., 2009) [PMID: 19066370]. Secondary endpoints included the prevention of colorectal and lung cancers. SELECT was conducted at 427 sites and centers in the United States, Canada and Puerto Rico; 35,533 men 55 years and older (50 or older if African American) were randomized beginning August 22, 2001. Supplementation was discontinued on October 23, 2008 due to futility. 308 colorectal cancer cases and 308 matched controls were selected from the SELECT population and sent for genotyping. Screening Markers For Colorectal Disease Study and Colonoscopy and Health Study (SMS-REACH): Details on this study population were previously reported (Burnett-Hartman et al., 2014) [PMID: 24875374]. Participants were enrollees in an integrated health-care delivery system in western Washington State (Group Health Cooperative, Seattle, Washington) aged 24-79 years who underwent an index colonoscopy for any indication between 1998 and 2007 and donated a buccal-cell or blood sample for genotyping analysis. Study recruitment took place in 2 phases, with phase 1 occurring in 1998-2003 and phase 2 occurring in 2004-2007. Persons who had undergone a colonoscopy less than 1 year prior to the index colonoscopy, persons with inadequate bowel preparation for the index colonoscopy, and persons with a prior or new diagnosis of colorectal cancer, a familial colorectal cancer syndrome (such as familial adenomatous polyposis), or another colorectal disease were ineligible. Patients diagnosed with adenomas or serrated polyps and persons who were polyp-free at the index colonoscopy (controls) were systematically recruited during both phases of recruitment. Approximately 75% agreed to participate and provided written informed consent. Based on medical records, persons who agreed to participate and those who refused study participation were similar with respect to age, sex, and colorectal polyp status. Study protocols were approved by the institutional review boards of the Group Health Cooperative and the Fred Hutchinson Cancer Research Center (Seattle, Washington). A total of 575 cases and 508 matched were selected for the study. Controls were matched to CRC cases as 1:1 ratio. Matching was done on age_ref, race, and sex. Age_ref was matched on +-5 years. The Women's Health Initiative (WHI): WHI is a long-term national health study that has focused on strategies for preventing heart disease, breast and colorectal cancer, and osteoporotic fractures in postmenopausal women. The original WHI study included 161,808 postmenopausal women enrolled between 1993 and 1998. The Fred Hutchinson Cancer Research Center in Seattle, WA serves as the WHI Clinical Coordinating Center for data collection, management, and analysis of the WHI. The WHI has two major parts: a partial factorial randomized Clinical Trial (CT) and an Observational Study (OS); both were conducted at 40 Clinical Centers nationwide. The CT enrolled 68,132 postmenopausal women between the ages of 50-79 into trials testing three prevention strategies. If eligible, women could choose to enroll in one, two, or all three of the trial components. The components are: Hormone Therapy Trials (HT): This double-blind component examined the effects of combined hormones or estrogen alone on the prevention of coronary heart disease and osteoporotic fractures, and associated risk for breast cancer. Women participating in this component with an intact uterus were randomized to estrogen plus progestin (conjugated equine estrogens [CEE], 0.625 mg/d plus medroxyprogesterone acetate [MPA] 2.5 mg/d] or a matching placebo. Women with prior hysterectomy were randomized to CEE or placebo. Both trials were stopped early, in July 2002 and March 2004, respectively, based on adverse effects. All HT participants continued to be followed without intervention until close-out. Dietary Modification Trial (DM): The Dietary Modification component evaluated the effect of a low-fat and high fruit, vegetable and grain diet on the prevention of breast and colorectal cancers and coronary heart disease. Study participants were randomized to either their usual eating pattern or a low-fat dietary pattern. Calcium/Vitamin D Trial (CaD): This double-blind component began 1 to 2 years after a woman joined one or both of the other clinical trial components. It evaluated the effect of calcium and vitamin D supplementation on the prevention of osteoporotic fractures and colorectal cancer. Women in this component were randomized to calcium (1000 mg/d) and vitamin D (400 IU/d) supplements or a matching placebo. The Observational Study (OS)examines the relationship between lifestyle, environmental, medical and molecular risk factors and specific measures of health or disease outcomes. This component involves tracking the medical history and health habits of 93,676 women not participating in the CT. Recruitment for the observational study was completed in 1998 and participants were followed annually for 8 to 12 years. All centrally confirmed cases of invasive colorectal cancers, or deaths from colorectal cancer were selected as potential cases from September 30, 2015 database. Controls were participants free of colorectal cancer (invasive or in situ) as of September 30, 2015. Potential cases and controls were excluded if they (1) were non-White; (2) had history of colorectal cancers at baseline; (3) lost to follow-up after enrollment; (4) DbGAP ineligible; (5) had <1.25ug of DNA; (6) selected for WHI study M26 Phase I or II; (7) selected for WHI study AS224 and also included in the imputation project. A total of 578 cases and 104,429 controls met the eligibility criteria. Each case was matched with 1 control (1:1) that exactly met the following matching criteria: age (+-5 years), 40 randomization centers (exact), WHI date (+-3 years), CaD date (+-3 years), OS flag (exact), HRT assignments (exact), DM assignments (exact), and CaD assignments (exact). Control selection was done in a time-forward manner, selecting one control for each case from the risk set at the time of the case's event. The matching algorithm was allowed to select the closest match based on a criteria to minimize an overall distance measure (Bergstralh EJ, Kosanke JL. Computerized matching of cases to controls. Technical Report #56, Department of Health Sciences Research, Mayo Clinic, Rochester MN. April 1995). Each matching factor was given the same weight. When exact matches could not be found, the matching criteria were gradually relaxed among unmatched cases and controls until all cases had found matched controls. Using the matching criteria specified above, 559 of the 578 eligible cases found exact matches. The matching criteria was then relaxed to : Age+-5, randomization centers, WHI date +- 3 years, CaD date +- 3 years, OS flag, HRT flag, DM flag, CaD flag. 17 of the remaining 19 unmatched cases found matched controls. By matching on Age+-5, randomization centers, WHI date +- 3 years, CaD date +- 3 years, OS flag, HRT flag, the remaining 2 unmatched cases found their matches.
The major goal of this project is to apply second generation resequencing technology to identify disease causing variants influencing pediatric and adult lung diseases in a collection of two longitudinal population cohorts of cystic fibrosis patients that have been well characterized for a comprehensive set of clinical traits. In Phase I, exome sequencing was performed on 43 cystic fibrosis patients with early Pa infection and 48 cystic fibrosis patients with late Pa infection to identify variants influencing the time to onset of Pa infection. In Phase II, additional exomes were added to the study, to reach a total of 91 individuals with early Pa infection and 96 with late Pa infection. The majority of the 340 subjects of Phase II do not have a Pa infection phenotype, but instead have a pulmonary function phenotype (121 severe vs. 124 mild impairment) as determined by the survival corrected Kulich FEV percentile of Corey et al. A small minority have intermediate phenotypes and/or show severe decline in lung function during childhood.
Phase 2 POISED (Peanut Oral Immunotherapy: Safety, Efficacy, Discovery) Study In this randomized, double-blind, placebo-controlled (DBPC) clinical trial, blinded placebo group received oat flour, whereas in active participants, dosage was built-up for over ~ 52 weeks and subsequently maintained on 4000 mg peanut protein, daily for next 52 weeks. 80 (98.77%) per-protocol active participants passed the food challenge at week 104. Subsequently, for 12 weeks, peanut ingestion was avoided in a randomized group of 51 blinded active participants (i.e., peanut avoidance group). 21 (41.2%) participants in the peanut avoidance group passed the 4000 mg double-blind, placebo-controlled food challenge (DBPCFC) without any allergic reaction at week 117, thus demonstrating sustained unresponsiveness. These 21 participants continued oral immunotherapy (OIT) for every three months and afterwards were allowed to continue peanut OIT discontinuation if they passed. 8 participants passed the DBPCFCs, 12 months after peanut discontinuation (week 156), i.e. they achieved long-term sustained unresponsiveness with no allergic reaction in response to food challenge. For detailed description, please refer to Chinthrajah et al., PMID: 31522849.
We have created a resource of transcriptomic (RNAseq), genomic (whole-exome seq), and lipidomic (untargeted shotgun) profiling data from over 175 molecularly diverse glioma tumors and derivative models in orthotopic mouse xenografts and gliomasphere cultures from over 110 unique patient tumor lines. Provided in this dataset are matched bulk RNA and whole-exome paired-end sequencing data from resected gliomas and derived model systems performed on Illumina HiSeq and NovaSeq sequencing instruments. This comprehensive dataset powers multiple studies aiming to derive molecular signatures of intertumoral heterogeneity and perturbation responses in glioma. Through integrative multi-omic analyses in Minami, et al., 2023, PMID: 37236196, we identified that CDKN2A deletion remodels the GBM lipidome and consequently primes CDKN2A-deleted tumors for ferroptosis. Lipidomics data related to this study are publicly available here. In another study, we used a combination of molecular profiling and functional profiling of apoptotic potential to stratify patients into groups with differential vulnerability to the current standard-of-care in gliomas, and in combination with drugs targeting intrinsic apoptosis, largely informed but not limited to TP53 mutation status.
Human liver ductal epithelium is morphologically, functionally and transcriptionally heterogeneous. However, understanding the dynamics within human biliary/ductal epithelium has been hampered by the absence of an in vitro system that fully mimics its complex cellular heterogeneity. Here, we found that the human liver cholangiocyte (ductal-cell) organoids we previously published (Huch et al., 2015) are fairly homogenous and do not retain the cellular heterogeneity of the in vivo human biliary/ductal epithelium. Inspired by the knowledge of the in vivo niche, we refined our previous organoid culture medium to fully capture the cellular heterogeneity of the human ductal epithelium. We employed this refined system to analyse the interactions and relationships between the different human biliary epithelial cell states and discovered that acquiring a bipotent ductal cell state is necessary for differentiating the human ductal epithelium into functional hepatocyte-like cells. Our improved cholangiocyte organoid model represents a new platform to investigate cell plasticity and duct-to-hepatocyte differentiation in human liver.
Samples will be from the BRF113683 (BREAK-3) study which is a Phase III Randomized, Open-label Study Comparing GSK2118436 to Dacarbazine (DTIC) in Previously Untreated Subjects With BRAF Mutation Positive Advanced (Stage III) or Metastatic (Stage IV) Melanoma (n=250 enrolled)•NGS [Agilent capture (Sanger V2 panel): 360 genes and 20 gene fusions; Illumina HiSEQ Sequencing]•CNV: [via NGS or Affy SNP 6.0 or Illumina Omni (TBD)]Bioinformatics: Analysis will be performed using core Sanger informatics pipelines similar to those previously described (Papaemmanuil E et al. (2013) Blood. 22:3616 -3627). Briefly, copy number analysis will be performed using the ASCAT algorithm, and base substitutions, small insertions and deletions using the CAVEMAN and Pindel algorithms, respectively. Statistical approaches including generalized linear models will be used to predict clinical variables such as maximum clinical response and duration of response using genetic data. Sanger and EBI to conduct analysis; Raw data and correlation with clinical endpoints to be analyzed by both EBI/Sanger and GSK (unique pipeline analyses to increase call confidence)
Alternative splicing plays critical roles in differentiation, development, and cancer (Pettigrew et al., 2008; Chen and Manley, 2009). The recent identification of specific spliceosome inhibitors has generated interest in the therapeutic potential of targeting this cellular process (van Alphen et al., 2009). Using an integrated genomic approach, we have identified PRPF6, an RNA binding component of the pre-mRNA spliceosome, as an essential driver of oncogenesis in colon cancer. Importantly, PRPF6 is both amplified and overexpressed in colon cancer, and only colon cancer cells with high PRPF6 levels are sensitive to its loss. Our data clearly point to an important role for PRPF6 in colon cancer growth and suggest that a better understanding of its role in alternative splicing in colon cancer is warranted. To determine the specific alternative splice forms that PRPF6 regulates in colon cancer, we plan three experiments: 1. The first involves knocking down expression of PRPF6 in two different cancer cell lines with 3 different siRNAs, and then completing RNA-seq to determine the gene expression changes that occur relative to a non-targeting control siRNA. Because of the role for PRPF6 in pre-mRNA splicing, we especially want to quantify the changes in splice-specific forms of all genes genome-wide to identify genes whose splicing is altered upon PRPF6 knockdown. 2. The second involves immunoprecipitating PRPF6 from two different cancer cell lines and isolating any RNA that is bound to PRPF6, since PRPF6 is an RNA-binding protein. We then want to carry out RNA-seq to identify which RNA molecules co-immunoprecipitated with PRPF6. This will help us determine possible functions for PRPF6 in regulating colon cancer growth. 3. The third involves overexpressing PRPF6 in cell lines and then carrying out RNA-seq to identify any changes in splice-specific gene expression. This will allow us to determine whether increased PRPF6 expression is sufficient to drive alternative splicing changes.
Early-stage Luminal B breast cancer is frequent and is a major cause of breast cancer death due to its poor prognosis. Our proposal aims to study the biology behind the sensitivity and resistance of Luminal B breast cancer to chemotherapy (CHT) or a non-CHT regimen composed of hormone therapy in combination with ribociclib, a CDK4/6 inhibitor. To accomplish this, we first completed the SOLTI-1402 CORALLEEN phase II trial, a study where 106 patients with early-stage Luminal B breast cancer were randomized to standard neoadjuvant CHT for 6 months, or neoadjuvant letrozole and ribociclib for 6 months. After treatment, patients underwent surgery. The primary results of the study, which showed that the response rate to letrozole+ribociclib was similar to CHT, was reported (Prat et al; Lancet Oncol). Tumor biopsies were available at baseline, week 3 and surgery. A total of 257 samples were analyzed using the Illumina TruSeq Stranded Total RNA w/Ribo Zero Gold with MiSeq in TGL (Sequencer NovaSeq S4/PE/100x).
Chronic inflammation, linked to the presence of bovine milk and meat factors (BMMF) and specific subsets of macrophages, results in oxygen radical synthesis and induction of mutations in DNA of actively replicating cells and replicating single stranded DNA (zur Hausen et al., 2017). Cancers arising from this process have been characterized as indirect carcinogenesis by infectious agents (without persistence of genes of the agent in premalignant or cancers cells). Here we investigate structural properties of pleomorphic vesicles, regularly identified by staining peritumour tissues of colorectal, lung and pancreatic cancer for expression of BMMF Rep. The latter represents a subgroup of BMMF1 proteins involved in replication of small single-stranded circular plasmids of BMMF, but most likely also contributing to pleomorphic vesicular structures found in the periphery of colorectal, lung and pancreatic cancers. Structurally dense regions are demonstrated in preselected areas of colorectal cancer, after staining with monoclonal antibodies against BMMF1 Rep. Similar structures were observed in human embryonic cells (HEK293TT) overexpressing Rep. These data suggest that Rep or Rep isoforms contribute to the structural formation of vesicles.
RNAseq data from Passman et al 2023. Clonal CCO-deficient hepatocyte patches and nearby CCO-proficient hepatocytes were identified in morphologically normal human livers and sampled at varying distances along the PT-CV axis. Samples are named according to their location within the liver lobule, with "PT" denoting samples abutting the portal triad, "CV" denoting samples abutting the central hepatic vein, and "Mid" sampled acquired midway between these structures. Normal human liver is thought to be generally quiescent, however clonal hepatocyte expansions have been observed but neither their cellular source nor their expansion dynamics have been determined. Knowing the hepatocyte cell of origin, and their subsequent dynamics and trajectory within the human liver will provide an important basis to understand disease-associated dysregulation. Here we use in vivo lineage tracing and a combination of methylation sequence analysis to demonstrate normal human hepatocyte ancestry. We exploit next generation mitochondrial sequencing to determine hepatocyte clonal expansion dynamics across spatially-distinct areas of laser-capture microdissected clones, in tandem with computational modelling in morphologically-normal human liver.
Lung cancer is the leading cause of cancer-related death in the world. In contrast to many other cancers, a direct connection to lifestyle risk in the form of cigarette smoke has long been established. More than 50% of all smoking-related lung cancers occur in former smokers, often many years after smoking cessation. Despite extensive research, the molecular processes for persistent lung cancer risk are unclear. CT screening of current and former smokers has been shown to reduce lung cancer mortality by up to 26%. To examine whether clinical risk stratification can be improved upon by the addition of genetic data, and to explore the mechanisms of the persisting risk in former smokers, we have analyzed transcriptomic data from accessible airway tissues of 487 subjects. We developed a model to assess smoking associated gene expression changes and their reversibility after smoking is stopped, in both healthy subjects and clinic patients. We find persistent smoking associated immune alterations to be a hallmark of the clinic patients. Integrating previous GWAS data using a transcriptional network approach, we demonstrate that the same immune and interferon related pathways are strongly enriched for genes linked to known genetic risk factors, demonstrating a causal relationship between immune alteration and lung cancer risk. Finally, we used accessible airway transcriptomic data to derive a non-invasive lung cancer risk classifier. Our results provide initial evidence for germline-mediated personalised smoke injury response and risk in the general population, with potential implications for managing long-term lung cancer incidence and mortality.
scRNA This dataset contains 50 scRNA-seq samples from bone marrow aspirates of 11 multiple myeloma patients experiencing long-term survival and 3 healthy donors. For each donor, total bone marrow and CD3+ T cells were sequenced. For multiple myeloma patients, paired samples were collected at initial diagnosis and between 7-17 years after first-line therapy. Bone marrow mononuclear cells were isolated by Ficoll density gradient centrifugation. For sorting of total bone marrow cells singlet, live cells were gated and sorted, for sorting of T cells CD45+, CD3+ cells were gated and sorted on either FACSAria Fusion or FACSAria II. Single-cell RNA sequencing were generated using 10x Genomics single-cell RNAseq technology (Chromium Single Cell 3’ Solution v2) according to the manufacturer’s protocol and sequenced on an Illumina HiSeq4000 (paired end, 26 and 74 bp). Bulk RNA Singlet, live CD3+CD4- CXCR3+CD8+ and CD3+CD4- CXCR3-CD8+ cells were sorted from 7 bone marrow and 3 peripheral blood samples of 7 multiple myeloma patients using a FACSAria Fusion machine. Bulk-RNA sequencing libraries were generated using the SMART Seq Stranded Total RNA-Seq kit (Takara) and sequenced using the Illumina NovaSeq 6000 platform (2 x 100 bp).
Live CD4 T cells were sorted from inflamed and non-inflamed tissue samples of IBD patients or from healthy and IBD blood samples. ATAC-Seq libraries were generated from live CD4 T cells sorted from i) inflamed and non-inflamed tissue samples, ii) healthy and IBD blood samples, or from iii) CD4 T cell subsets polarised from healthy blood samples. After isolating crude nuclei, live CD4+ T cells were treated with Tagment DNA buffer and Tagment DNA Enzyme (Nextera DNA Library Prep Kit, Illumina), and then the DNA was purified by MinElute PCR Purification Kit (Qiagen). Transposed DNA fragments were amplified using specific adapters followed by purification with MinElute PCR Purification Kit (Qiagen). Fragments from 240-360pb were selected in the PippinHT system (Sage Science). The quality of the library and its DNA concentration were assessed by Bioanalyzer instruments (Agilent Technologies) and ultimately submitted for sequencing using Illumina HiSeq 2500 sequencer, V4 chemistry. On the other hand, single cell RNA-Seq libraries were generated exclusively from inflamed and non-inflamed tissue samples of Crohn’s disease patients. Briefly, live CD4 T cells were captured and encapsulated before cDNA amplification using the 10X Genomics Chromium Platform. Samples were prepared as outlined by 10x genomics Single Cell 3’ Reagent Kits v2 user guide. Samples were sequenced on a HiSeq 2500 with the following run parameters: Read 1 – 26 cycles, read 2 – 98 cycles, index 1 – 8 cycles.