The goal of this study is to iteratively identify, validate and refine biomarkers to predict clinical failure, and identify novel targets for therapy, beyond traditional antibiotics, for patients with severe pneumonia requiring intubation, including severe community-acquired pneumonia (CAP), hospital-acquired pneumonia, and ventilator-associated pneumonia (HAP/VAP). We hypothesize that factors within the alveolar microenvironment in many patients with pneumonia prevent the active process of pneumonia resolution.The major emphasis will be on HAP. Mechanically ventilated CAP patients are a very high-risk group for subsequent VAP, and therefore CAP samples will be included to determine baseline bacteria, microbiome, and host defense patterns for those that do or do not develop HAP/VAP. Hospital-acquired pneumonia is the leading cause of death from nosocomial infections. Even in patients receiving pathogen-appropriate therapy, reported clinical failure rates are increasing, approaching 50% in recent clinical trials. Current approaches to improve outcomes in patients with pneumonia almost exclusively focus on the development of antibiotics to address the problem of resistance. However, the prevalence of antibiotic-resistant pathogens is insufficient to explain the high rates of clinical failure. Instead, we hypothesize that factors within the alveolar microenvironment in many patients with pneumonia prevent the active process of pneumonia resolution. These factors include pathogen-specific adaptations to the injured lung that modulate the innate and adaptive immune response, as well as changes to the alveolar microbiome, driven by selective pressure within our modern intensive care units. Understanding this complex adaptive system requires the integration of clinical information with unbiased measures of the host, pathogen, and microbiome changes within the alveolar microenvironment. These measures must be repeated over time to identify factors that distinguish successful from failed therapy for pneumonia sufficiently early during the clinical course to allow for effective interventions. To address this challenge, we have assembled a talented group of investigators in the Successful Clinical Response In Pneumonia Therapy (SCRIPT) Systems Biology Center. We will leverage our routine clinical practice of safe alveolar sampling in mechanically ventilated patients with pneumonia with repeated non-bronchoscopic bronchoalveolar lavage (NBBAL) sampling over the course of pneumonia. From this fluid, we will combine flow cytometry with multi-omic technologies (cell population-specific transcriptomics, epigenomics, shotgun microbiome sequencing, and pathogen-specific sequencing). Our systems scientists will integrate these genomic data with robust clinical -omics with the goal of identifying biomarkers that can be prospectively tested in patients and causally evaluated in humanized mouse models.The addition of a pooled sample artificially increases the number of automatically computed subjects by one.
The Genomic Sequencing for Childhood Risk and Newborn Illness (BabySeq Project) is a research study exploring the use of genomic sequencing in newborns. The National Institutes of Health is funding this study. Investigators enrolled 240 healthy infants and their parents from the Brigham and Women's Hospital (BWH) Well Newborn Nursery and 240 sick infants and their parents at Boston Children's Hospital (BCH) or the BWH Neonatal Intensive Care Unit (NICU). A small blood sample was taken from each infant and genome sequencing may have been performed. Whole-exome (WES) was performed at the Broad Institute. Six weeks later, the results were returned and explained. Over 12 months, the investigators studied the experiences of parents and pediatricians of infants who receive sequencing to help understand how best to use genomics in pediatric care. Parents were surveyed at four points over the 12 months after enrollment: baseline, immediately post-disclosure (approximately 6 weeks after enrollment), 3 months post-disclosure, and at 10 months post-disclosure. Primary outcome measures included: (1) downstream health care costs attributable to BabySeq Project disclosure measured as days of inpatient care; (2) Parents' distress using validated scales and a novel item assessing blame; (3) parent-child relationship using validated scales; (4) Parents' relationship using validated and novel measures of marital satisfaction; (5-9) downstream health care utilization attributable to BabySeq project disclosure measured as the number of health care provider visits, per-patient counts for number of medications at 10 months, number of emergency room visits, number of outpatient lab tests, and per-patient means for healthcare costs in U.S. dollars. A secondary outcome was whether there was a change in perceived utility toward genomic sequencing. Findings of BabySeq include: (1) interest in newborn genomic testing is high among parents of healthy children and the majority of couples had similar levels of interest; (2) parents reported several motivations to receive and reasons to decline adult-onset only results from newborn sequencing; (3) 88% of newborns had at least one recessive carrier variant that could be relevant to their parents' future reproductive planning; (4) 5% of babies had an atypical pharmacogenomic variant related to how they might process medications used in childhood; (5) parent surveys using validated measures showed no evidence that newborn genomic sequencing caused increased psychological distress, even if the baby had a disease risk identified; and (6) 11% of newborn babies in the study had unanticipated monogenic disease risks.
Background: Understanding the cancer genome is seen as a key step in improving outcomes for cancer patients. Genomic assays are emerging as a possible avenue to personalised medicine in breast cancer. The majority of work in this area has targeted primary tumours however, and very few studies have performed comprehensive profiling of advanced disease. Evolution of the cancer genome during the natural history of breast cancer is largely unknown, as is the profile of disease at death. We sought to study in detail these aspects of advanced breast cancers that have resulted in lethal disease. Methods and Findings: Three patients with ER-positive, HER2-negative breast cancer and one patient with triple negative breast cancer underwent rapid autopsy as part of an institutional prospective community-based rapid autopsy program. Cases represented a range of management problems in breast cancer, including late relapse after early stage disease; de novo metastatic disease; discordant disease response and disease refractory to treatment. Between 5 and 12 metastatic sites were collected at autopsy together with available primary tumours and longitudinal metastatic biopsies taken during life. Samples underwent paired tumour-normal whole exome sequencing and single nucleotide polymorphism arrays. Subclonal architectures were inferred by jointly analysing all samples from each patient. Mutations were validated using high depth amplicon sequencing.Between cases, there were significant differences in mutational burden, driver mutations, mutational processes and copy number variation. Within each case, we found dramatic heterogeneity in subclonal structure from primary to metastatic disease and between metastatic sites, such that no single lesion captured the breadth of disease. Metastatic cross seeding was found in each case and treatment drove subclonal diversification. Subclones displayed parallel evolution of treatment resistance in some cases, and apparent augmentation of key oncogenic drivers as an alternative resistance mechanism. We also observed the key role of mutational processes in subclonal evolution.Limitations of this study include the potential for bias introduced by joint analysis of formalin fixed archival specimens with fresh specimens, and the difficulties in resolving subclones with whole exome sequencing. Other alterations that could define subclones such as structural variants or epigenetic modifications were not assessed. Conclusions: This study highlights the variety of mechanisms that shape the genome of metastatic breast cancer, and the value of studying advanced disease in detail. Treatment drives significant genomic heterogeneity in breast cancers which has implications for disease monitoring and treatment selection in the personalised medicine paradigm.
CINECA, EUCANCan, and euCanSHare were part of the EUCAN Cluster, made up of six projects that received funding under the same Horizon 2020 call. All EUCAN projects (CINECA, EUCANCan, EUCAN-Connect, euCanSHare, Receptor Plus, and ReCoDID) were aimed at facilitating data reuse and knowledge discovery by enhancing data exchange and long-term collaboration in the health field. Here are some highlights about EGA’s contribution to these projects. CINECA: advances in Federated discovery and infrastructure for cohorts CINECA (Common Infrastructure for National Cohorts in Europe, Canada, and Africa) developed a federated cloud-based infrastructure for making genomic and biomolecular data accessible. The project has assembled a virtual cohort of 1.4 million individuals from sources such as the EGA, CanDIG and H3Africa. The EGA–CRG co-leaded a work package Work Package 1 on Federated Data Discovery and Querying. Beacon v2, championed by Jordi Rambla and Lauren Fromont, has been one of the central elements for the discovery of human genetic and phenotypic data. The EGA-CRG contributed to the development of a model for cohort discovery inside the Beacon v2 model. The team also delivered a Discovery Portal was implemented to explore cohorts and individuals of synthetic data. It is a UI that gathers a network of Beacons, a service for query expansion, and a visualisation tool. Have a look at the Discovery Portal UI, gathering a network of Beacons. You will find entities such as the Barcelona Supercomputing Center (BSC), BioData.pt and the European Genomic Data Infrastructure (GDI), among others. euCanSHare: facilitation of data access management This joint EU-Canada project aimed to build a European and Canadian FAIR platform for cardiovascular data sharing and analysis. The EGA’s tasks contributed to the development of the data management plan and data flows, main web-portal and interoperability protocols. An important part of the EuCanShare platform is the data access manager, a tool for the data owners to control the access to their sensitive datasets. We built a user-friendly interface for data access committees (data owners) to easily manage their data requests and access credentials. The data access portal facilitates the creation and internal organization of the data access committees as well as the linkage of data usage conditions to specific datasets. The interface includes filters for browsing requests and page to visualize the request history helping with the handling of data access requests. EUCANCan: toward a federation of clinical institutions in oncology The EUropean-CAnadian CANcer Network worked towards building a federated network to advance personalized medicine in oncology, by promoting the standard analysis, management and sharing of harmonized genomic and phenotypic data. The EGA led tasks on defining data flows and preparing an adapted infrastructure for long-term data storage and sharing. One of the key contributions was the conception of the EGA communities, offering standard and interoperable solutions for data discovery, processing, and sharing to projects and institutions that would like to manage and share data in the context of the EGA ecosystem. What's Next? The participation in projects such as CINECA, EUCANCan, and euCanSHare provide us with experience for future projects. We are happy to enhance data sharing and reuse, vital for advancing clinical and genomic research. The Discovery Portal is a wonderful proof of concept for the Beacon network. Next, we will finalise both the network and the user interface, and make sure it can be applied to more clinically-centred settings like hospitals. Most data hosted at the EGA are about cancer research. Currently, we contribute to building tools and infrastructure to empower oncology research in several European projects such as EUCAIM, EUCANIMAGE and EOSC4Cancer. We also do not lose sight of initiatives in the field such as the International Cancer Genome Consortium (ICGC) with the project ARGO, whose data model was adopted in EUCanCan.
The length of telomeres, the chromosomes’ end tails, as depicted in the figure below, physiologically shorten with aging. This means that younger cells have longer telomeres compared to their aged counterparts; one could say that the length of cells’ telomeres is a measure of their replicative power. Cancer cells are known to be able to turn this into their favour, and expand their replicative immortality through telomere lengthening and maintenance. Despite being a well-studied pan-cancer hallmark, the relation of the telomeres length to specific cancer genomic features is poorly understood. Julia Livingstone and her colleagues from P.C. Boutros' team in Los Angeles focused their work on prostate cancer. To investigate the relations between telomere length and the clinico-genomics of prostate tumours, they used whole genome sequencing (WGS) of 392 published tumour–reference pairs previously deposited at the EGA under accession numbers EGAS00001000400 and EGAS00001000900. Their work was published on Nature Communications in November 2021. They determined the telomere length of the sequencing sample, from both tumour and non-tumour adjacent tissues, and subsequently ran statistical analysis to investigate possible correlation between the telomeres length with several cancer features. The team discovered that telomeres length does not correlate with any specific driver gene mutation, but it does with the overall frequency of single nucleotides variants in a sample. This finding suggests that tumours with shorter telomeres accumulate more mutations without strong selective pressures for specific ones. While proliferation rate did not correlate directly with telomere length, methylation of almost half of all genes strongly did. The team’s results are massive and reveal a complicated and fascinating relation of different clinical and genomic features of prostate cancer with telomere length. In the paper discussion, they smartly consider different possibilities. As noted above, all these amazing results have been obtained analysing previously sequenced samples. Remarkably, the researchers who generated the samples did not focus at all on telomeres length. They indeed studied specific genes mutations and the features of early and late-onset prostate cancer. This proves, once again, the power of sequencing data re-use. This data is tremendously rich in information, and can hardly be exhausted with any number of analyses. Generating sequencing data, especially Whole Genome Sequencing, even better if tumor-reference paired, is like shopping every last article at the grocery store. You can then cook several recipes, or prepare a seven-course dinner, and still have ingredients left for someone else to borrow and cook something you have not thought about, a recipe that is just not in your cookbook. Noteworthy, the data retrieved from the EGA for this paper is associated with rich and complete metadata, meaning pheno-clinical information regarding the sample donors. As Julia Livingstone and her team wrote, using rich clinical annotations enabled them to assess the relationship between telomere length and patients outcome. This highlights once again the importance of reporting and annotating metadata correctly; It might feel like extra work for the researchers depositing their data, but it really has a huge value. Our team at the EGA is currently working to improve the metadata annotation process, with the aim to maximize the benefits for the scientific community.
This study includes 1,220 cases with young onset stroke (stroke before age 60 years) who are participants of the larger RACE study. Risk Assessment of Cerebrovascular Events (RACE) is an on-going existing case-control study of stroke now involving over 5000 imaging confirmed cases of stroke and 5000 controls, recruited from seven centers in Pakistan. The study is aimed to investigate the genetic, biomarker and lifestyle determinants of stroke and its subtypes. Cases are eligible for inclusion in the study if they: (i) are aged at least 18 years; (ii) present with a sudden onset of neurological deficit respecting a vascular territory with sustained deficit at 24 hours verified by medical attention within 72 hours after onset (onset is defined by when the patient was last seen normal and not when found with deficit); and (iii) the diagnosis is supported by CT/MRI; and (iv) present with a Modified Rankin Score < 2 prior to the stroke. Findings from patient's history, 12-lead ECG and CT or MRI of the brain. The mandatory procedures for inclusion in this investigation are: (i) clinical verification of cerebrovascular event within 72 hours of onset; (ii) neuroimaging CT (non-contrast) or MRI (MRI is not a mandatory investigation but recorded whenever ordered by the attending physician); and (iii) 12-lead ECG. All other ancillary investigations ordered by the attending physician are recorded as well. The TOAST classification method is used to classify ischemic stroke based on aetiology whereas the Oxfordshire classification is used to classify stroke neuro-anatomically. Control participants for this subset of young onset stroke were individuals enrolled in the Pakistan Risk of Myocardial Infarction Study (PROMIS), a case-control study of acute MI based in Pakistan. RACE capitalizes on the genetic data (including information on GWAS) that has already been collected from the healthy participants enrolled in PROMIS. RACE and PROMIS share similar methodology of recruitment. Participants from both these investigations are derived from similar catchment areas, hence providing an attractive opportunity for RACE to utilize PROMIS controls as common controls for genetic investigations. Controls in PROMIS were recruited following procedures and inclusion criteria as adopted for RACE cases. In order to minimize any potential selection biases, PROMIS controls selected for this stroke substudy were frequency matched to RACE cases based on age and gender and were recruited in the following order of priority: (1) non-blood related or blood related visitors of patients of the out-patient department; (2) non-blood related visitors of stroke patients; (3) patients of the out-patient department presenting with minor complaints (e.g. back pain, minor gastric complaints). Control subjects from the PROMIS study were genotyped at the Wellcome Trust Sanger Institute on the Illumina 660W Quad array. The Center for Non-Communicable Diseases, Pakistan, serves as the coordinating center for both RACE and PROMIS. More information on these research investigations can be found at www.cncdpk.com. This young onset stroke component to the RACE study was funded through the Gene Environment Association Studies initiative (GENEVA, www.genevastudy.org as one of three studies designed to assess the genetics of young onset stroke and modification of genetic effects by smoking. GENEVA is part of the trans-NIH Genes, Environment, and Health Initiative (GEI). Genotyping of 1,220 young onset stroke cases was performed at the Johns Hopkins University Center for Inherited Disease Research (CIDR). Data cleaning and harmonization were done at the GEI-funded GENEVA Coordinating Center at the University of Washington. This study is part of the Gene Environment Association Studies initiative (GENEVA, http://www.genevastudy.org) funded by the trans-NIH Genes, Environment, and Health Initiative (GEI). The overarching goal is to identify novel genetic factors that contribute to stroke through large-scale genome-wide association studies of cases and controls recruited within Pakistan. Genotyping was performed at the Johns Hopkins University Center for Inherited Disease Research (CIDR). Data cleaning and harmonization were done at the GEI-funded GENEVA Coordinating Center at the University of Washington.
The study was conducted in Bulgaria as a collaboration between Virginia Commonwealth University (Richmond, VA, USA), the Bulgarian Addictions Institute (Sofia, Bulgaria), the Molecular Medicine Center/Medical University (Sofia, Bulgaria), and Indiana University (Bloomington, IN, USA). The overall aim of this study is to investigate the role of impulsivity as an endophenotype for drug addiction. Although impulsivity is considered one of the strongest candidate endophenotypes for addiction, progress in the field is hampered by the heterogeneity of impulsivity, characterized by multiple personality, psychiatric, and neurocognitive dimensions, rarely examined concurrently in the same population; and the heterogeneity of addiction phenotypes, due in part to the high rates of polysubstance dependence among substance users. To address these challenges, we have developed a program of addiction research in Bulgaria, a key transit country for heroin trafficking due to its strategic geographical location on the "Balkan Drug Route" and a major European center for production of synthetic amphetamine-type stimulants. This has allowed us to access rare populations of predominantly mono-substance dependent heroin and amphetamine users, many in protracted abstinence. Our preliminary results reveal a complex relationship between trait and neurocognitive (state) dimensions of impulsivity, often manifested in opposite directions in heroin and amphetamine dependent individuals. Pilot computational modeling analyses of decision-making, a central neurocognitive aspect of impulsivity, have proved particularly informative by indicating that different mechanisms may underlie the impaired decision-making of opiate and stimulant users. A different modeling approach, i.e. phenotypic modeling, holds significant promise to address the pervasive "missing heritability" problem in genetic studies. While genetic heterogeneity is often invoked as an explanation, the manner in which complex phenotypic traits are measured and modeled is equally important contributor to the missing heritability problem but has received much less attention in the literature. Despite the multidimensionality of traits measured by psychometric, diagnostic, and neurocognitive instruments, most GWAS studies typically use aggregate sum scores that do not reflect the underlying phenotypic multidimensionality. Therefore, at least part of the missing heritability problem may originate in misspecification of the phenotypic models. Consequently, sample sizes requirements may increase from ~800 subjects in correctly specified models to 6,000-16,000 subjects in incorrectly specified models. The current study aims to increase our understanding of the complex relationship between multiple putative impulsivity endophenotypes to help redefine endophenotypes as multi-level combination of measures that could inform multivariate multilevel models of complex phenotypes. The specific aims of the study are to: (1) Assess the utility of various personality, psychiatric, and neurocognitive indices of impulsivity (either individually or in combination) as candidate endophenotype(s) for drug addiction in general and for opiate and stimulant addictions in particular; (2) Evaluate the viability of computational model parameters modeling various neurocognitive dimensions of impulsivity as novel endophenotype(s) for addiction; and (3) Test the external validity of the optimal endophenotype(s) by evaluating their associations with HIV and other risk behaviors in opiate and stimulant users in protracted abstinence, a question of critical importance for prevention and intervention efforts in this much less-well understood stage of the addiction cycle.
Focal segmental glomerulosclerosis (FSGS) is a frequent cause of end-stage renal disease. The pathogenesis of FSGS has not been precisely defined and there are no consistently effective treatments. Recent studies identifying causal genes in rare, inherited FSGS, including our own study, have associated mutations in at least six genes with familial FSGS, and each discovery has clarified molecular mechanisms of glomerular injury. To build on this productive line of inquiry, we have ascertained and carefully characterized 118 families with familial FSGS. We have screened the remainder of our families for mutations in genes known to cause FSGS and identified the causal mutations in an additional 6 kindreds; the genetic basis of disease in the remaining 111 families is unknown. The objective of this proposal is to use this valuable and unique family resource to systematically identify causal genes for familial FSGS. Limitations of current conventional linkage and positional cloning approaches include their requirement for large, multiplex families. In addition, narrowing candidate areas in traditional linkage analysis can be difficult due to large regions that lack recombination events and hence these regions have required cumbersome and lengthy screening for causative mutations. Powerful new genetic tools can facilitate this screening process and improve variant discovery in smaller families. In particular, efficient whole-exome sequencing, the targeted capture of protein-coding gene sequences, should be particularly useful in our studies since most Mendelian disorders are caused by mutations affecting exomes of the target gene. Thus, by combining genome-wide linkage analysis (GWLS) and whole-exome sequencing, we can maximize impact of our family data and accelerate identification of novel mutations in FSGS. In preliminary studies, we have used this combination to identify a novel variant in the WT1 (Wilms' Tumor-1) gene in one FSGS family, and we have evidence suggesting it is the causal mutation. This success provides proof-of-concept and provides a roadmap of how genes will be identified and evaluated in the proposed studies. Our hypothesis is that causes of inherited FSGS in our cohort of families will be sequence variants in the coding region of genes not previously associated with familial FSGS. We aim to: 1) Use GWLS and whole-exome sequencing to identify genetic variants associated with familial FSGS. 2) Characterize functional consequences of candidate causative mutations and 3) Determine the prevalence in the Duke FSGS dataset of these new causative mutations identified in Aim 2. Any genes found to have causative mutations will be sequenced in the remaining families and take full advantage of our family resource. By combining genome-wide linkage analysis (GWLS), whole-exome sequencing, and characterization of variants' functional consequences, we will significantly improve understanding of normal glomerular biology and of the pathogenesis of FSGS and related glomerular diseases. Moreover, our discoveries are likely to reveal new opportunities to improve therapy for a disease that currently has few effective treatments.
The study was conducted under the auspices of the Transdisciplinary Research In Cancer of the Lung (TRICL) Research Team, which is a part of the Genetic Associations and MEchanisms in ONcology (GAME-ON) consortium, and associated with the International Lung Cancer Consortium (ILCCO). Ethics: All participants provided written informed consent. All studies were reviewed and approved by institutional ethics review committees at the involved institutions. Sequencing data are derived from four substudies. The substudies that contributed include Harvard, Liverpool, Toronto, and IARC. The Alpha-Tocopherol, Beta-Carotene Cancer Prevention (ATBC) Study is a randomized primary prevention trial including 29,133 male smokers enrolled in Finland between 1985 and 1993. Participants ranged between ages of 50 to 69 at enrollment and were randomized in a factorial design to take either 50 milligrams of d-alpha tocopheryl acetate (Vitamin E), 20 mg of all-trans-beta-carotene, both or placebo. The study continued to monitor cancer incidence through 2012 and total mortality through December 2013. The CAncer de PUlmon en Asturias Study (CAPUA) is a hospital-based case-control study conducted in Asturias, Spain by the University of Oviedo. Lung cancer cases were recruited in three main hospitals of Asturias, following an identical protocol from 2002 to 2012. Eligible cases were incident cases of histologically confirmed lung cancer between 30 and 85 years of age and residents in the geographical area of each participating hospital. Controls were selected from patients admitted to those hospitals with diagnoses unrelated to the exposures of interest and individually matched by ethnicity, gender, age (± 5 years) and hospital. Epidemiologic data were collected personally through computer-assisted questionnaires by trained interviewers during the first hospital admission. Structured questionnaires collected information on sociodemographic characteristics, recent and prior tobacco use, environmental exposure (air pollution and passive smoking), diet, personal and family history of cancer, and occupational history from each participant. Peripheral blood samples (or mouthwash samples when they refused to donate blood) were collected from all participants. Coding of histology was based on 2001 WHO/IASLC. Genomic DNA was extracted based on standard protocol. The Canadian Screening Study includes the nested case-control samples from 3 screening programs: IELCAP-Toronto: Ever smokers of more than 10 pack-years age 50 and above were eligible for the I-ELCAP screening program since 2003, and a total of 4782 individuals have been enrolled in the Greater Toronto Area. Participants were administered a LDCT scan along with a standard study questionnaire at baseline. Blood samples were systematically collected at baseline since 2006. Participants who had an abnormality in a CT scan were followed up every 1 to 2 years. The screening program was organized by the Princess Margaret Hospital. PanCan: Ever smokers between the ages of 50-75 with no previous history of invasive cancer are eligible to participate in the study. The study was carried out across Canada in Vancouver, Calgary, Hamilton, Toronto, Ottawa, Quebec, Halifax, and St. John's. A total of 2537 smokers have been screened from 2008 to 2011. All study participants completed a detailed questionnaire, spirometry, collection of blood specimens for biomarker measurement and LDCT at baseline. All participants are followed for a minimum of 3 years. On yearly follow up, an updated shorter questionnaire is administered, blood is collected and CT scans are performed. Blood samples are available from all 2537 individuals. BCCA Screening Program: From 1990 to 2007, 4274 smokers above 40 years old who had smoked 20 pack-years or more were enrolled at BCCA. Upon enrollment, subjects completed a questionnaire for their lifestyle and medical history. Baseline spirometry was conducted using a flow-sensitive spirometer in accordance with the American Thoracic Society recommendations. Since 2000, a LDCT was obtained in 2440 individuals. The participants were followed prospectively to determine whether they developed lung cancer. A total of 9759 individuals participated in the CT screening program in Canada from these 3 programs. The samples included in this project is based on a subset of nested lung cancer case-control pairs based on 1:2 ratio. The Carotene and Retinol Efficacy Trial (CARET) was a randomized, double-blind, placebo-controlled trial of the cancer prevention efficacy and safety of a daily combination of 30 mg of beta-carotene and 25,000 IU of retinyl palmitate in 18,314 persons at high risk for lung cancer. CARET began in 1985, and the intervention was halted in January 1996, 21 months ahead of schedule, with the twin conclusions for definitive evidence of no benefit and substantial evidence of a harmful effect of the intervention on both lung cancer incidence and total mortality. CARET continued to follow and collect endpoints on their participants through 2005. Pathology reports and medical records were reviewed to confirm cancer endpoints, and death certificates obtained to capture cause of death. During the active intervention phase of CARET, serum, plasma, whole blood, and lung tissue specimens were collected on participants. These biospecimens make up the CARET Biorepository. For the OncoArray Project, CARET provided DNA extracted from whole blood of lung cancer cases and controls matched on age at baseline (± 4 years), sex, race, baseline smoking status, history of occupational asbestos exposure (asbestos vs heavy smoker), and year of enrollment (2-year intervals). The European Prospective Investigation into Cancer and Nutrition (EPIC) study is a multi-center cohort study involving 521,000 study participants from 10 European countries. The current study involved EPIC participants from 7 countries (Greece, Netherlands, UK, France, Germany, Spain, and Italy), including 1223 incident lung cancer cases and 1249 smoking matched controls. The Kentucky Lung Cancer Research Initiative is a study conducted by the Markey Cancer Center Cancer Center and the University of Kentucky using a population-based, case-control framework to study the extraordinarily high rates of lung cancer in Southeastern, Appalachian Kentucky. Cancer cases were recruited from the Kentucky Cancer Registry at the time of diagnosis and controls were recruited from a random digit dialing process from the same region. Study accrual began in January 5, 2012 and completed on September 5, 2014 and 520 subjects were recruited in a 4:1 ratio of controls: cases from Appalachian Kentucky. Of the 520 subjects recruited, 231 are included in the OncoArray analysis, including all 93 cancer cases, and 123 controls. Newly diagnosed lung cancer cases and controls underwent blood, toenail (for trace element analysis), urine, buffy coat, water, soil, and radon collection, residence GPS mapping, as well as an extensive epidemiologic, occupational, and health history questionnaire (Clinical Trials.gov Identifier: NCT01648166). The Harvard Lung Cancer Study (HLCS) is a case-control study based at Mass General Hospital (MGH) in Boston, Massachusetts from 1992 to 2004. Details of the study were described previously. Briefly, eligible cases included any person over the age of 18 years with a diagnosis of primary lung cancer that was further confirmed by an MGH lung pathologist. Controls were recruited from the friends or spouses of cancer patients or the friends or spouses of other surgery patients in the same hospital. Potential controls were excluded from participation if they had a diagnosis of any cancer (other than non-melanoma skin cancer). Interviewer-administered questionnaires, a modified version of the standardized American Thoracic Society respiratory questionnaire, collected information on demographics, medical history, family history of cancer, smoking history, and a detailed work history, including job titles and tasks. Genome-wide genotype data were first generated using Illumina Human 610-Quad BeadChips and then imputed by MACH against the 1000 Genome Project dataset (http://browser.1000genomes.org/index.html). The Institutional Review Board of MGH and the Human Subjects Committee of the Harvard School of Public Health approved the study. The Israel study (NICCC-LCA) is an ongoing case-control study of newly diagnosed lung cancer cases of any histology and population age/sex/ethnicity-matched "healthy" controls. All participants undergo face-to-face interviews, provide a venous blood sample (separated into DNA, Sera, lymphocytes) after signing an IRB-approved form. Histology reports, FFPE blocks and clinical follow-up are available for most cancer cases. The MD Anderson Cancer Center (MDACC) Study. Lung cancer cases and frequency-matched controls were ascertained from a large ongoing case-control study at the University of Texas MD Anderson Cancer Center (UTMDACC) since 1991. Detailed study description was provided previously (Spitz et al 2007). In brief, cases were newly-diagnosed and histologically confirmed lung cancer patients recruited from UTMDACC. Controls were healthy individuals without a history of cancer (except for nonmelanoma skin cancer) and recruited from the Kelsey-Seybold Clinics, the largest private multispecialty physician group in the Houston metropolitan area. Controls were frequency-matched to cases on age (±5 years), sex, and race/ethnicity. After providing written informed consent, each study participants completed an in-person interview by staff interviewers to collect information on demographics, smoking status, etc. Blood samples were also drawn from all the study participants. This study was approved by institutional review boards of UTMDACC and Kelsey-Seybold Clinics. The Malmö Diet and Cancer Study (MDCS) is a population-based prospective cohort study that recruited men and women aged at 44 to 74 years old of living in Malmö, Sweden between 1991 and 1996. The main goal of the MDCS is to study the impact of diet on cancer incidence and mortality. It consists of a baseline examination including dietary assessment, a self-administered questionnaire, anthropometric measurements and collection of blood samples. A total of 165 incident lung cancer cases and 174 individually smoking-matched controls were available for this analysis. The Multiethnic Cohort (MEC) Study includes 215,251 men and women aged 45-74 years at recruitment, primarily from five ethnic/racial groups - African Americans and Latinos mostly recruited from CA (mainly from Los Angeles County) and Japanese Americans, Native Hawaiians and whites (mostly recruited from HI). The cohort was assembled in 1993-1996 by mailing a self-administered questionnaire to persons identified primarily through driver's license files. The baseline questionnaire obtained information on demographics, anthropometry, smoking history, medical and reproductive histories, family history of cancer, diet and physical activity. Incident cancer cases are identified by regular linkage with the State of California Cancer Registry and the Hawaii Tumor Registry, both members of the SEER Program of the NCI. In 2001-2006, a prospective biorepository was assembled by collecting a pre-diagnostic blood specimen from 67,594 surviving MEC members. At the time of blood collection a short questionnaire was administered that included information on smoking during the previous 15 days. For this study, cases were all lung cancer cases incident to blood draw and diagnosed before December 2012. For each case, a control was selected among unaffected MEC participants who were alive at time of the case's diagnosis and matched on study site, sex, race/ethnicity, age (age at diagnosis for cases; age at blood collection for controls), and date of blood collection. The Mount-Sinai Hospital-Princess Margaret Study (MSH-PMH) was conducted in the greater Toronto area from 2008 to 2013. Lung cancer cases were recruited at the hospitals in the network of the University of Toronto. Controls were selected randomly from individuals registered in the family medicine clinics databases and were frequency matched with cases on age and sex. All subjects were interviewed, and information on lifestyle risk factors, occupational history and medical and family history was collected using a standard questionnaire. Tumors were centrally reviewed by the reference pathologist, a member of the International Association for the Study of Lung Cancer (IASLC) committee, and a second pathologist in the University Health Network. If the reviews conflicted, a consensus was arrived at after discussion. Coding of histology was based on 2001 WHO/IASLC. Genomic DNA was extracted based on standard protocol. The New England Lung Cancer Study (NELCS) is a population-based case-control study of lung cancer among residents of Northern and Central New Hampshire counties and the bordering region of Vermont. Cases with histologically confirmed primary incident lung cancer were identified from 2005 to 2007 using the New Hampshire State Cancer Registry and the Dartmouth-Hitchcock Medical Center (DHMC) Tumor Registry. Control participants were identified using a commercial database and matched to lung cancer cases within 5-year age groups, sex and county. Genomic DNA was isolated from blood or buccal specimens provided by consenting participants. The study complied with requirements of the Dartmouth College's Committee for Protection of Human Subjects. The Nijmegen Lung Cancer Study. The Netherlands patients with lung cancer were identified through the population-based cancer registry of the Netherlands Comprehensive Cancer Organisation in Nijmegen, the Netherlands. Patients who were diagnosed in one of three hospitals (Radboud University Medical Center, Canisius Wilhelmina Hospital in Nijmegen, and Rijnstate Hospital in Arnhem) since 1989 and who were still alive at April 15th, 2008 were recruited for a study on gene-environment interactions in lung cancer. 458 patients gave informed consent and donated a blood sample. This case series was expanded with 94 patients to a total of 552 by linking three other studies to the population-based cancer registry in order to identify new occurrences of lung cancer among the participants of these other studies. All three other studies (i.e., POLYGENE, the Nijmegen Biomedical Study, and the Radboudumc Urology Outpatient Clinic Epidemiology Study) were initiated to study genetic risk factors for disease and participants to these studies gave general informed consent for DNA-related research and linkage with disease registries. Information on histology, stage of disease, and age at diagnoses was obtained through the cancer registry. Lifestyle information was collected through a structured questionnaire and whole blood for DNA isolation was collected by the regional thrombosis services. The cancer-free controls (46% males) were selected from participants of the "Nijmegen Biomedical Study" (NBS), an age- and sex-stratified random sample of the general population of the municipality of Nijmegen, The Netherlands. All participants provided extensive lifestyle information by structured questionnaires and blood samples for DNA isolation, serum and plasma. All controls are of self-reported European descent. The study protocols of the NBS were approved by the Institutional Review Board of the Radboudumc and all study subjects signed a written informed consent form. The Northern Sweden Health and Disease Study (NSHDS) encompasses several prospective cohorts. The current study involves participants from the Västerbotten Intervention Project (VIP), a sub-cohort within NSHDS. VIP is an ongoing prospective cohort and intervention study intended for health promotion of the general population of the Västerbotten County in northern Sweden. VIP was initiated in 1985 and all residents in the Västerbotten County were invited to participate by attending a health check-up at 40, 50 and 60 years of age. Participants were asked to complete a self-administered questionnaire including various demographic factors such as education, smoking habits, physical activity and diet. In addition, height and weight were measured and participants were asked to donate a fasting blood sample for future research. A total of 243 incident lung cancer cases and 266 individually smoking-matched controls were available for this analysis. Norway National Institute of Occupational Health Study. Early-stage NSCLC cases and healthy controls at the time of enrollment were Caucasians of Norwegian origin and were recruited from the same geographical region (Western Norway). The patients were enrolled in the study, whenever practically feasible among patients admitted for lung cancer at the Haukeland University Hospital in Bergen, Norway. The informed written consents covering analysis of molecular and genetic markers was signed by the patients prior to surgery. Only patients with histologically confirmed early-stage NSCLC were included in our study. The subjects included in this project are a subgroup recruited into the project "lung cancer genetics" at NIOH. The controls were recruited from the same geographical region of Western Norway and frequency-matched with cases on cumulative smoking dose (pack-years). Pack-years smoked [( 20 cigarettes per day) x years smoked] were calculated to indicate the cumulative smoking dose. The Cases and controls were interviewed using similar questionnaires and were categorized as never smokers, ex-smokers or current smokers. Never smokers are subjects indicating having smoked less than 100 cigarettes in their life time. Ex-smokers were defined as those having quitted at least 1 year before sampling, and current smokers were those indicating that they were smokers at the time of sampling. The project has been approved by the Regional Committee for Medical and Health Research Ethics in Southern Norway in accordance with the WMA Declaration of Helsinki. The ethical approval covered access to the NSCLC databank. The Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) Study, a randomized trial aimed at evaluating the efficacy of screening in reducing cancer mortality, recruited approximately 155,000 men and women age 55 to 74 years from 1992 to 20014. Screening for lung cancer among participants in the intervention arm included a chest x-ray at baseline followed by either three annual x-rays (for current or former smokers at enrollment) or two annual x-rays (for never smokers); participants in the control arm received routine health care. Screening-arm participants provided data on sociodemographic factors, smoking behavior, anthropometric characteristics, medical history, and family history of cancer, as well as blood samples annually for the first 6 years of the study (baseline T0 and T1 through T5). Lung cancers were ascertained through annual questionnaires mailed to the participants, and positive reports were followed up by abstracting medical records or death certificates. Follow-up in the trial as of July 2009 was 96.7%. Patients were excluded because of missing baseline questionnaire, previous history of any cancer, diagnosis of multiple cancers during follow-up, missing smoking information at baseline, missing consent for utilization of biologic specimens for etiologic studies, or unavailability/insufficient quantity of serum or DNA specimens. The Resource for the Study of Lung Cancer Epidemiology in North Trent (ReSoLuCENT) is an ongoing study conducted in Sheffield from 2006 and due to complete recruitment in 2016. The study recruited pathologically confirmed lung cancer cases diagnosed at age 60 years or younger and family matched controls. Lung cancer cases diagnosed at ages older than 60 years were recruited if they reported a family history of lung cancer. The cases and matched controls were recruited through several major cancer treatment centers, however, the majority were recruited in North Trent. All participants completed a detailed lifestyle questionnaire which included questions about occupational exposures, education, medical history and family history of cancer and lung disease. Participants also donated blood samples for DNA extraction. The ReSoLuCENT study has been funded by the Sheffield Hospitals Charity, Sheffield ECMC and Weston Park Hospital Cancer Charity. First degree relatives were removed from the sample deposited to dbGaP. The Roy Castle Lung Study of Liverpool Lung Project (LLP) is a case-control and cohort study which has recruited over 11,500 individuals since 1996 from the Liverpool region in the UK. Detailed epidemiological and clinical data is collected with associated specimens (i.e. tumor tissue, blood, plasma, sputum, bronchial lavage and oral brushings). The participants have completed a detailed lifestyle questionnaire at recruitment, with repeat questionnaires at intervals; updated data on clinical outcome and hospital events are collected through the Health and Social Care Information Center (including Office of National Statistics mortality data, Cancer Registry and Health Episode Statistics). The project is registered on the UK National Institute for Health Research (NIHR) lung cancer portfolio and has all the required ethical approvals and sponsorship arrangements in place. The lung tumors were reviewed by the reference pathologist. The Seoul Bundang Lung Cancer Study was conducted between 2005 and 2010 to discover genetic and environmental factors related with lung cancer development. Lung cancer cases were recruited at the Seoul National University Hospital in Bundang. Controls were selected randomly from individuals participated in health check-up program and were frequency matched with cases on age and sex. All subjects were interviewed, and information on lifestyle risk factors, occupational history and medical and family history was collected using a standard questionnaire. Tumors were reviewed by the pathologists in the hospital. If the reviews conflicted, a consensus was arrived at after discussion. Coding of histology was based on 2001 WHO/IASLC. Genomic DNA was extracted based on standard protocol. The Shanghai Cohort Study (SCS) consisted of 18,244 men in Shanghai, China, who were 45-64 years old at the time of enrollment during 1986-1989. Approximately 80% of eligible men participated in the study. At the time of recruitment, each cohort subject was interviewed in-person by a trained nurse interviewer using a structured questionnaire that included background information, history of tobacco and alcohol use, current diet, and medical history. At the completion of the interview, the nurse collected a 10 ml blood and a single void urine specimen from the study participant. The buccal cell samples were collected from all surviving cohort members (~15,000) in the 2001-2002 follow-up interviews. The cohort has been followed for the occurrence of cancer and death through routine ascertainment of new cases from the population-based Shanghai Cancer Registry and Shanghai Vital Statistics Units. To maximize the cancer findings and minimize the loss of follow-up, we contacted each surviving cohort member annually. Retired nurses visit the last known address of each living cohort member and record details of the interim health history of the cohort member. As of December 31, 2014, cumulatively 612 (3.4%) original subjects were lost to follow-up, and 574 (3.1%) refused to our continued follow-up interview. A nested case-control study of incident lung cancer cases within the Shanghai Cohort Study was used to examine the association between serum levels of vitamin B6 and other compounds in the one-carbon metabolism pathway and risk of lung cancer. Briefly, 516 lung cancer cases were identified among cohort participants with available serum samples as of 12/31/2006. For each case, we randomly selected one control subject from all cohort members who were free of cancer and alive at the time of cancer diagnosis of the index case. Controls were matched to the index case by age at enrollment (±2 years), date of biospecimen collection (±1 month) and neighborhood of residence at recruitment, and smoking status (current, former and never smokers) as established previously for other studies. For former smokers, cases and controls were further matched by years since quitting smoking (<10 vs ≥10 years). One serum vial per subject was retrieved from biorepository and all serum samples were sent to the laboratory (B-vital) for measurements. DNA samples of 250 lung cancer cases and 250 matched controls were available for the present study. The Singapore Chinese Health Study (SCHS) cohort consisted of 63,257 Chinese men and women in Singapore when they were 45-74 years old at the time of enrollment between April 1993 and December 1998. At recruitment, each study subject was interviewed in person by a trained interviewer using a structured questionnaire that emphasized current diet assessed via a validated, 165-item food frequency questionnaire. The questionnaire also requested information on demographics, lifetime use of tobacco, incense use, current physical activity, usual sleep duration, reproductive history (women only), occupational exposure, medical history, and family history of cancer. Blood or buccal cell, and spot urine samples were collected first from a random 3% sample of cohort participants in April 1994, and extended to all surviving cohort participants starting in January 2000. Overall approximately 60% of eligible cohort participants donated biospecimens. The cohort has been passively followed for death and cancer occurrence through regular record linkage with the population-based Singapore Cancer Registry and the Singapore Registry of Births and Deaths. Migration out of Singapore, especially among housing estate residents, was negligible. As of latest update, only 55 individuals from this cohort were known to be lost to follow-up due to migration and other reason. A nested case-control study of incident lung cancer cases within the Singapore Chinese Health Study was used to examine the association between serum levels of vitamin B6 and other compounds in the one-carbon metabolism pathway and risk of lung cancer. As of 12/31/2011, 422 lung cancer cases were identified among cohort participants with available prediagnostic plasma samples. For each case, one control subject was randomly selected from all eligible cohort members who were alive and free of cancer on the date of cancer diagnosis of the index case. The control subject was individually matched to the index case by gender, dialect group (Hokkien, Cantonese), age at enrollment (±3 years), date of baseline interview (±2 year), date of biospecimen collection (±6 months), and smoking status (current, former, and never smokers). For current smokers, cases and controls were further matched by number of cigarettes per day (<15, ≥15 cigarettes/day). For former smokers, cases and controls were further matched by years since quitting smoking (<10, ≥10 years). One plasma aliquot per subject was retrieved from the biorepository and all plasma samples were sent to the laboratory (B-vital) for measurements, and one aliquot of DNA per subject for the present study. The International Agency for Research on Cancer (IARC) L2 Study. Lung cancer cases and controls were recruited through a multicentric case-control study coordinated by the IARC in Russia, Poland, Serbia, Czech Republic, and Romania from 2005 to 2013. Cases were incident cancer patients collected from general hospitals. Controls were recruited from individuals visiting general hospitals and out-patient clinics for disorders unrelated to lung cancer and/or its associated risk factors, or from the general population. Information on lifestyle risk factors, medical and family history was collected from subjects by interview using a standard questionnaire. All study participants provided written informed consent. The current study included 1,133 lung cancer cases and 1,117 controls genotyped on the Oncoarray. The Washington State University Lung Cancer Study is a hospital case-control study of 511 subjects with newly-diagnosed (within 1 year of diagnosis) lung cancer and 820 race-, sex- and age-matched controls. Lung cancer cases were recruited from lung cancer clinics within the H. Lee Moffitt Cancer Center while controls were recruited from the Lifetime Cancer Screening Center, a H. Lee Moffitt Cancer Center affiliate. None of the controls were diagnosed with any form of cancer at the time of screening. Detailed questionnaire data and oral buccal cells were collected for all subjects. The Total Lung Cancer (TLC) Study is a hospital-based study that included 458 lung cancer patients recruited for Moffitt Cancer Center's Total Cancer Care™ protocol between April 2006 and August 2010. Total Cancer Care™ is a multi-institutional observational study of cancer patients that prospectively collects self-reported demographic and clinical data, medical record information and blood samples for research purposes. All patients used in this cohort were recruited from the Thoracic Oncology Clinic at the Moffitt Cancer Center. The Vanderbilt Lung Cancer Study (BioVU) is a case-control study nested within the Vanderbilt University Medical Center biobank, BioVU. BioVU is a biorepository of DNA extracted from blood drawn from patients seeking routine clinical care at Vanderbilt University Medical Center and linked to de-identified electronic health records for research purposes. Lung cancer cases and controls were identified from BioVU participants in February 2014. Lung cancer cases were identified from the Vanderbilt tumor registry. All specimens undergo pathologic review for determination of morphology. Coding of histology was based on SEER Program Coding Guidelines. Controls were randomly selected from BioVU participants, excluding cancer patients, and were matched to cases on age (± 5 years), sex, and race. Relevant covariates were identified from electronic health records using natural language processing. Genomic DNA was extracted based on a standard protocol.
Spiradenocarcinoma is a rare cutaneous sweat gland adnexal cancer with potential for aggressive behaviour. They are classified histologically into low- and high-grade tumours, with morphologically low-grade tumours thought to behave more favourably. However, limited information is available, with only 18 published cases. We have collected morphologically low-grade spiroadenocarcinomas (one with a lung metastasis) and high-grade spiroadenocarcinomas, as well as some spiradenomas (benign lesions), cylindromas (another type of malignant cutaneous sweat gland adnexal tumour) and hybrid spiradenoma-cylindromas. H&E-stained sections were reviewed, follow-up was obtained, and immunohistochemistry for Ki-67, p53 and, MYB has been performed. The tumours were solitary, measuring 0.8-7?cm (median: 2.7?cm), with a predilection for the head and neck of elderly patients (median age: 72 years; range 53-92) without gender bias. Histologically, the tumours were multinodular and located in deep dermis and subcutis. A pre-existing spiradenoma was present in all cases. The malignant component was characterized by expansile growth with loss of the dual cell population, up to moderate cytological atypia and increased mitotic activity (median: 10/10 HPF; range 1-28). Additional findings included squamoid differentiation (n=9), necrosis (n=7), and ulceration (n=5). P53 expression was variable and no significant differences were noted in the benign compared with the malignant parts of the tumours. In contrast, in the malignant components the Ki-67 proliferative index was slightly increased, and MYB expression was lost. Follow-up (median: 67 months; range: 13-132) available for 16 patients (84%) revealed a local recurrence rate of 19% but no metastases or disease-related mortality. Here we wish to exome sequence these cases to define the first genomic landscape for this malignancy. This dataset contains all the data available for this study on 2018-10-29.