Many cancers are characterized by gene fusions encoding oncogenic chimeric transcription factors (TFs) such as EWS::FLI1 in Ewing sarcoma (EwS). Here, we find that EWS::FLI1 induces the robust expression of a specific set of novel spliced and polyadenylated transcripts within otherwise transcriptionally silent regions of the genome. These neogenes (NGs) are virtually undetectable in large collections of normal tissues or non-EwS tumors and can be silenced by CRISPR interference at regulatory EWS::FLI1-bound microsatellites. Ribosome profiling and proteomics further show that some NGs are translated into highly EwS-specific peptides. More generally, we show that hundreds of NGs can be detected in diverse cancers characterized by chimeric TFs. Altogether, this study identifies the transcription, processing, and translation of novel, specific, highly expressed multi-exonic transcripts from otherwise silent regions of the genome as a new activity of aberrant TFs in cancer.
Chronic obstructive pulmonary disease (COPD) is a major respiratory disease characterized by small airway inflammation, emphysema and severe breathing difficulties. Low-grade systemic inflammation is an established hallmark of severe disease, however, the molecular changes in peripheral immune cells remain far from understood. We combined multi-color flow cytometry with single-cell RNA sequencing and showed that blood neutrophil numbers are significantly increased in COPD and they are a heterogeneous population. A transcriptomic state that expressed interferon response genes correlated with alveolar damage and acute exacerbations. Furthermore, bronchoalveolar neutrophils expressed gene signatures corresponding to certain blood neutrophil states. Last, our data in a murine model of cigarette smoke exposure demonstrated that bone marrow neutrophil progenitors are expanded in smoke-treated animals and display signs of immune activation. Our study provides evidence that COPD systemic inflammation may derive from an activated haematopoietic precursor compartment.
The diagnosis of sinonasal tumors is challenging due to a heterogeneous spectrum of various differential diagnoses as well as poorly defined, disputed entities such as sinonasal undifferentiated carcinomas (SNUCs). In this study, we apply a machine learning algorithm based on DNA methylation patterns to classify sinonasal tumors with clinical-grade reliability. We further show that sinonasal tumors with SNUC morphology are not as undifferentiated as their current terminology suggests but rather reassigned to four distinct molecular classes defined by epigenetic, mutational and proteomic profiles. This includes two classes with neuroendocrine differentiation, characterized by IDH2 or SMARCA4/ARID1A mutations with an overall favorable clinical course, one class composed of highly aggressive SMARCB1-deficient carcinomas and another class with tumors that represent previously misclassified adenoid cystic carcinomas. The repository includes the raw mass spectrometry-based proteomics data
The diagnosis of sinonasal tumors is challenging due to a heterogeneous spectrum of various differential diagnoses as well as poorly defined, disputed entities such as sinonasal undifferentiated carcinomas (SNUCs). In this study, we apply a machine learning algorithm based on DNA methylation patterns to classify sinonasal tumors with clinical-grade reliability. We further show that sinonasal tumors with SNUC morphology are not as undifferentiated as their current terminology suggests but rather reassigned to four distinct molecular classes defined by epigenetic, mutational and proteomic profiles. This includes two classes with neuroendocrine differentiation, characterized by IDH2 or SMARCA4/ARID1A mutations with an overall favorable clinical course, one class composed of highly aggressive SMARCB1-deficient carcinomas and another class with tumors that represent previously misclassified adenoid cystic carcinomas. This repository includes the results from DNA sequencing and mass spectrometry-based proteomics.
Due to the lower incidence of T-LBL and difficulties in obtaining diagnostic T-LBL material, extensive research on T-LBL has been hampered whereas genetic aberrations in T-ALL are thoroughly characterized. Given the similarities and differences between T-LBL and T-ALL, the question has been raised whether T-LBL and T-ALL represent two different diseases or different manifestations of the same disease. This study aims to identify the genomic and transcriptomic landscape of T-LBL and compare the findings to what is found T-ALL. Comparison of the molecular aberrations between T-LBL and T-ALL can provide insights into the overlap and differences in malignant development between the two entities, which could lead to improved risk stratification in T-LBL in order to eventually adapt T-LBL treatment protocols based on molecular-genetic prognostic factors.
The migration of Austronesian-speaking populations through Oceania has intrigued researchers for decades. The Kiribati islands, situated along the boundaries of Micronesia and Polynesia, provide a crucial link in this migration. We analyzed the genome-wide data of the Kiritimati population of Kiribati to uncover their genetic origins and connections with other Oceanian groups. Our study reveals that the Kiritimati population primarily exhibits Remote Oceanian-related ancestry associated with ancient Lapita and present-day Polynesian populations. In addition, our identity-by-descent analysis identifies populations from Philippines as their closest relatives in Island Southeast Asia. The genetic links between Kiritimati, ancient Lapita, and modern Polynesians underscore the shared ancestry and continuous gene flow across these regions. This genetic continuity and ongoing links are supported by linguistic and cultural evidence, illustrating a complex history of migration and admixture in Oceania.
Localised prostate cancers (PCa) are heterogeneous and multifocal, with diverse outcomes. Current prognostic methods are epithelium-centric, overlooking the complex cellular landscape within the tumour microenvironment (TME), which remains incompletely characterised. We performed a comprehensive analysis of cancerous and adjacent-benign cores from 24 patients with hormone therapy-naïve localised PCa using single-cell RNA-sequencing. By integrating copy number variation and transcriptional signatures, we classified epithelial cells across a malignant spectrum, revealing widespread molecular perturbation. We found an expansion of Club cell phenotypes, suggestive of Luminal dedifferentiation. We also performed a detailed annotation of stromal phenotypes, focusing on fibroblasts, and identified a novel peri-neural fibroblast population. Spatial transcriptomics elucidated the precise anatomical distribution of CAFs within the PCa TME. This study provides a valuable foundation for advancing our understanding of PCa pathobiology and developing a comprehensive cellular model of the disease.
eMERGE-PGx is a multi-site test of the concept that sequence information can be coupled to electronic medical records (EMRs) for use in healthcare. The promise of personalized medicine - health care guided by each individual's biological characteristics - is being fostered by increasingly powerful and economical methods to acquire clinically relevant biomarkers from large numbers of people. One therapeutic area that seems especially ripe for an early test of the personalized medicine concept is pharmacogenomics (PGx) - the idea that individual variation in drug response includes a genomic component. Drug response variation is an accepted feature of virtually all drug treatments, and contemporary molecular biologic tools continue to identify key genes mediating drug metabolism, transport, and targets. Importantly, common variation in these genes is an increasingly well-recognized contributor, sometimes with large effects, to variation in drug responses. As a result, recommendations for genotype-guided therapy are increasing. These evidence-based recommendations, if implemented in health care practice, could reduce adverse drug events and improve time to therapeutic response. Through eMERGE-PGx, we are developing strategies for the optimal implementation of genetic sequence data into the clinical environment with the ultimate goal of improving patient care. Site and participants include: Children's Hospital of Pennsylvania (CHOP): The Center for Applied Genomics (CAG) at the Children's Hospital of Philadelphia (CHOP) is a high-throughput, highly automated genotyping and sequencing facility equipped with state-of-the-art genotyping and sequencing platforms. Children who are treated at the Children's Hospital Healthcare Network and their parents may be eligible to take part in a major initiative to collect more than 100,000 blood samples, covering a wide range of pediatric diseases. The PGx population selected for sequencing with the PGRNseq panel at CHOP is 1,650 children from CAG's biorepository with well-documented drug-related severe adverse events (SAEs) or EHR-based drug response profiles. SAEs were extracted from EPIC records and from CHOP's Adverse Event (AE) database, which documents every AE at CHOP. These AEs are classified by a medical review panel according to the causal relationship with the suspected drug into 'doubtful', 'possible', and 'probable'. Individuals with events classified as probable, severe and objective, were selected for sequencing. The drugs more frequently associated with adverse events are antibiotics, antineoplastics, immunosuppressants and psychotropic drugs. This cohort constitutes 50% of the target population. The remaining subjects were selected using EHR-based algorithms that we have developed and validated at CAG for identifying patients not responding to ADHD medication (primarily atomoxetine) and patients refractory to antiepileptic treatment from responders. Cincinnati Children's Hospital Medical Center/Boston's Children's Hospital (CCHMC/BCH): 811 CCHMC samples were obtained from children, adolescents or young adults exposed to medication or at risk for needing medication of study interest. 55% of participants were exposed to one or more opioids and their DNA source was a CCHMC study-specific biobank; while 27% of participants were at risk for needing an opioid for surgical pain management and were newly recruited. The remainder of the cohort was exposed to methylphenidate and their DNA samples were obtained from a CCHMC study-specific biobank. The focus of Boston Children's Hospital eMERGE PGx project is on individuals with epilepsy. Samples were taken from a current pharmacogenomics study already in place through which DMET analysis was run and used as confirmation for PGRN-Seq results. A total of 109 samples were sent for PGRN-Seq analysis at University of Washington. The remaining 141 epilepsy samples were from Children's Hospital of Philadelphia and underwent testing with PGRN-Seq at CHOP. Geisinger Health System: A research cohort of adult Geisinger Clinic patients was enrolled from community-based primary care clinics of the Geisinger Health System. Patients were eligible for enrollment if they were a primary care patient of a Geisinger Clinic physician and were scheduled for a non-emergent clinic visit. All data are from Geisinger patients who consent to participate in the MyCode project. MyCode participants agree to provide biological samples for broad research use, including genomic analysis, and for linking of sample data to information in the participant's Geisinger health record. The consent also permits sharing of de-identified data for research purposes. Group Health(GH)/University of Washington (UW): Potential GH participants for the PGx project were enrolled in the eMERGE Network through the Northwest Institute of Genetic Medicine (NWIGM) biorepository, and provided the appropriate consent to receive clinically relevant genetic results (N~6300). Participants were eligible if aged 50 - 65 years old at the time of their enrollment into the NWIGM repository, living, enrolled in GH's integrated group practice, and had completed an online Health Risk Appraisal. The selection algorithm was based on several data sources from the EHR at Group Health: 1. Demographics - participants with self-reported race as Asian or African ancestry were prioritized and selected to enrich for non-European ancestry; 2. Diagnosis and procedure codes - participants were selected if found to have a history of hypertension, atrial fibrillation (AF,) or congestive heart failure (CHF). Participants with a history of arrhythmia were added if the entire selection algorithm did not generate 900 individuals. We also enriched for participants with EHR evidence of actionable indications related to PGRNSeq genes. Participants were selected if found to have an ICD9 code for malignant hyperthermia, hypertension, atrial fibrillation, congestive heart failure or long QT syndrome (LQTS); 3. Laboratory values - if a participant had any laboratory event of creatine kinase (CK) > 1000, and were dispensed statins within 6 months of the event, then they were selected; and 4. Medications - participants were excluded if ever on carbamazepine or had a current regimen of warfarin. Essentia Institute of Rural Health, Marshfield Clinic, Pennsylvania State University (Marshfield): For this study, 750 subjects were selected and enrolled into PGx based on Vanderbilt's algorithm designed to enrich for patients who are most likely to receive one of three common drugs (Clopidogrel, Warfarin or Simvastatin) in the next 2-3 years. These patients were sent a letter of invitation and description of the PGx project. Follow-up phone calls were made, and interested subjects came in for a one time meeting to discuss the project and go through the informed consent with the research coordinator. If they were interested they signed the consent and HIPAA forms and gave blood. Subjects were chosen and enrolled into PGx independently of previous biobank participation. Mayo Clinic: The Right Drug, Right Dose, Right Time - Using Genomic Data to Individualize Treatment (The RIGHT Protocol) enrolled 1013 patients to test the hypothesis that prescribers could deliver genome-guided drug therapy at the point-of-care by using pharmacogenomic data preemptively integrated in the electronic medical record. Complete details regarding the study population have been previously described (Bielinski et al., 2014). Icahn School of Medicine at Mount Sinai School (Mt Sinai): Our study site is the Primary Care Associates (PCA) practice group of the Mount Sinai Faculty Practice Associates (FPA) of the Mount Sinai Medical Center in New York City. This practice has 12 physician providers. All patient encounters are documented and managed with EpicCare ambulatory electronic medical record. Active PCA Patients eligible for enrollment fulfilled the following criteria: a) age 50 or older receiving clinical care at Mount Sinai FPA PCA practice with at least one practice encounter within 18 months prior to commencement of enrollment; b) no history or current use of clopidogrel, warfarin, or simvastatin. Eligible patients were invited to participate through de novo recruitment by letter sent by their provider. Interested patients were screened for eligibility and enrolled to participate in the eMERGE PGX study on site by a dedicated research coordinator. In addition to de novo enrollment from clinical practice, patients of FPA PCA who had previously enrolled in Mount Sinai's BioMe Biobank program AND fulfilled eligibility criteria as stated under a) and b) were identified by chart review and samples sequenced at CIDR using PGRNseq platform (N=300). PGRNseq data from 291 samples passed stringent quality control and are included in the current data set. Furthermore, 56 of these patients carrying known and validated 'actionable' variants affecting prescribing of clopidogrel, warfarin, and/or simvastatin were enrolled in the eMERGE PGX study following invitation through recontacting by the Principal Investigator of the BioMe Program. Northwestern University: Participants for this study were recruited from the General Internal Medicine (GIM) clinic at Northwestern Medical Group (NMG). Patients were selected for invitation to participate if they had been seen a minimum of two times over the last four years, having a high likelihood to receive a prescription for warfarin, Plavix, or a statin, and are seeing a physician who has agreed to allow their patients to be contacted for the study. We utilized an algorithm developed at Vanderbilt and tailored to our population which uses our EHR to estimate the probability that individuals will receive a prescription for warfarin, Plavix, or a statin in the next three years. Participants were sent a letter explaining the study prior to their GIM appointment and offered participation at the time of their visit. Participants were consented on-site and blood drawn after consent was obtained. The GIM clinic consists of 39 primary care physicians who provide approximately 80,000 patient encounters per year. As with any large primary care clinic, a significant proportion of patients in GIM clinic suffer from a variety of chronic health conditions, such as diabetes, hypertension, and coronary artery disease. Over 50,000 individuals have been seen by GIM doctors in the past 5 years; 11,562 of these patients have evidence of a statin prescription in the EHR, 3,436 have evidence of a warfarin prescription, and 1,872 have evidence of a Plavix prescription. Vanderbilt University: The more than 1000 participants enrolled into Vanderbilt's eMERGE PGx study were newly recruited from the Cardiology and Internal Medicine Clinics and the Hillsboro Medical Group within Vanderbilt University Medical Center (VUMC). Patients were selected based on a predictive algorithm estimating the patient's likelihood of receiving Clopidogrel, Warfarin, and/or Simvastatin. The algorithm identifies primarily older middle-aged patients, and the mean age of the study group is 74. The cohort is approximately 45% female with 75% of subjects self-identified as EA and 24% as AA. Subjects were consented in person by study personnel following a routine clinic visit and an introduction to the study staff by their doctor. VUMC is a comprehensive health care facility dedicated to patient care, research, and the education of health care professionals. Translational research into the causes and treatment of disease as well as studying fundamental biological properties is the primary focus of discovery at Vanderbilt. Clinical research is conducted in Vanderbilt University Hospital, the Nashville Veterans Administration Hospital, Meharry General Hospital and in their associated outpatient clinics. These hospitals and clinics, all associated with the Vanderbilt system, each have full time Vanderbilt faculty and medical housestaff and provide clinical care and participate in research programs. The Vanderbilt Clinic is comprised of more than 95 adult outpatient specialty practices and received over 1.5 million ambulatory visits in 2012-13. The Vanderbilt Heart and Vascular Institute offers a comprehensive heart program offering diagnosis, medical treatment, minimally invasive therapies, surgical intervention and disease management, tailored to each individual's unique needs. All programs within the Vanderbilt Clinic have survival figures that surpass the national average.
Genome-wide genotyping was performed on a population-based cohort from the capital region of Finland using the Illumina 610-Quad SNP microarray
SNP-chip genotyping data for one proband in the DDD study (Ref : Carvalho AJHG 2015)