Study Overview The Environmental Determinants of Diabetes in the Young (TEDDY) Study is a longitudinal study that investigates genetic and genetic-environmental interactions, including gestational events, childhood infections, dietary exposures, and other environmental factors after birth, in relation to the development of islet autoimmunity and type 1 diabetes (T1D). A consortium of six clinical centers assembled to participate in the development and implementation of the study to identify environmental triggers for the development of islet autoimmunity and T1D in genetically susceptible individuals. Beginning in 2004, the TEDDY study screened over 400,000 newborns for high-risk HLA-DR, DQ genotypes from both the general population and families already affected by T1D. The TEDDY study enrolled around 8,676 participants across six clinical centers worldwide (Finland, Germany, Sweden and three in the United States) in the 15-year prospective follow-up. Participants are followed every three months for islet autoantibody (IA) measurements with blood sampling until four years of age and then at least every six months until the age of 15. After the age of four, autoantibody positive participants continue to be followed at three month intervals and autoantibody negative participants are followed at six-month intervals. In addition to the analysis of autoantibodies, additional data and sample collection are performed at each visit. Parents collect monthly stool samples in early childhood. The parents also fill out questionnaires at regular intervals in connection with study visits and record information about diet and health status in the child's TEDDY Book between visits. Continued long-term follow-up of the currently active TEDDY participants will provide important scientific information on early childhood diet, reported and measured infections, vaccinations, and psychosocial stressors that may contribute to the development of type 1 diabetes and islet autoimmunity. Additional information on the TEDDY study is available in the following articles: Rewers et al., 2008, PMID: 19120261 and Hagopian et al., 2006, PMID: 17130573. Details of the TEDDY protocol can be found in Hagopian et al., 2011, PMID: 21564455. TEDDY data currently available in dbGaP include: gene expression, SNPs, exome, microbiome (gut, nasal, and plasma), RNA sequencing, and whole genome sequencing. For more information on TEDDY Study version history please refer to TEDDY Study dbGaP README File. ImmunoChip SNP DNA from whole blood samples on study participants and their family members (mothers, fathers, and siblings) was obtained and used for SNP genotyping. Genotyping was performed by the Center for Public Health Genomics at the University of Virginia using the Illumina ImmunoChip SNP array, which contains around 196,000 SNPs from 186 regions associated with 12 autoimmune diseases (Hadley et al., 2015, PMID: 26010309). Data cleaning and validation included the removal of subjects with a low call rate (< 5% SNPs missing) and differences in reported sex and prior genotyping at the TEDDY HLA laboratory. Additionally, SNPs with a low call rate or Hardy-Weinberg equilibrium P value < 10-6, except for chromosome 6 due to HLA eligibility requirements, were removed from the final dataset (Törn et al., 2015, PMID: 25422107).TEDDY-T1DExome ArrayDNA from whole blood samples on study participants and their family members (mothers, fathers, and siblings) was obtained and used for genotyping. Genotyping was performed by the University of Virginia using the Illumina TEDDY-T1DExome array. The TEDDY-T1DExome array is a custom chip that contains 550,601 markers from the Infinium CoreExome-24 v1.1 BeadChip and an additional 90,214 tagSNPs specifically selected by the TEDDY investigators based on their associations with nutrients, vitamins, type 2 diabetes, autoimmune diseases, body-mass index, or other exposures and phenotypes measured by TEDDY study.The Illumina GenTrain2 algorithm was used for genotype calling. Sample quality control metrics included sample call rate, heterozygosity rate and concordance of gender between the information reported and genotyped. Gene Expression The TEDDY study collected peripheral blood for the extraction of total RNA from enrolled children starting at 3 months of age, and then at 3 month intervals up to 48 months and then biannually. Total RNA was extracted using a high throughput (96-well format) extraction protocol using magnetic (MagMax) beads technology at the TEDDY RNA Laboratory, Jinfiniti Biosciences in Augusta, GA. Purified RNA (200 ng) was further used for cRNA amplification and labeling with biotin using Target Amp cDNA synthesis kit (Epicenter catalog no. TAB1R6924). Labeled cRNA was hybridized to the Illumina HumanHT-12 Expression BeadChips based on the manufacturer's instructions. The HumanHT-12 Expression BeadChip provides coverage for more than 47,000 transcripts and known splice variants across the human transcriptome. Microbiome The TEDDY microbiome study aimed to characterize the longitudinal development of the microbiome, including bacteria, viruses and other microorganisms in the gut, plasma, and nasal cavity of prediabetic and diabetic subjects compared to autoantibody negative non-diabetic subjects. Stool samples used were collected monthly from 3 to 48 months, after which stool samples were collected every 3 months. Nasal swab samples were collected every 3 months starting at 9 months of age until 48 months, after which nasal swabs were collected every 6 months. Plasma samples were collected every 3 months starting at 3 months of age until 48 months, after which plasma samples were collected every 6 months. If the subject was autoantibody positive at 48 months then they remained on the 3 month collection interval for nasal swab and plasma samples. Samples underwent 16s rRNA gene sequencing, DNA and viral RNA metagenomics shotgun sequencing, and sequencing of the internal transcribed spacer (ITS) regions. Additional information on the TEDDY microbiome data is available in the following articles: Vatanen et al., 2018, PMID: 30356183, Stewart et al., 2018, PMID: 30356187, and Vehik et al., 2020, PMID: 31792456. RNA Sequencing The TEDDY study aimed to characterize the transcriptome in subjects with islet autoimmunity and type 1 diabetes compared to matched control subjects. Peripheral blood was collected to extract total RNA from enrolled children starting at 3 months of age, and then at 3 month intervals up to 48 months and then biannually. Total RNA was extracted using a high throughput (96-well format) extraction protocol using magnetic (MagMax) beads technology at the TEDDY RNA Laboratory, Jinfiniti Biosciences in Augusta, GA. Purified RNA was then sent to the Broad Institute for the generation of the TEDDY RNA sequencing (RNA-Seq) data. The RNA samples were prepped using Superscript III reverse transcriptase and Illumina's TruSeq Stranded mRNA Sample Prep Kit. The TruSeq libraries were run on the Illumina HiSeq2500 platform. Whole Genome Sequencing The TEDDY study aimed to conduct deep whole genome sequencing and examine the genomic variations in subjects with islet autoimmunity and type 1 diabetes compared to matched autoantibody negative and non-diabetic children. DNA from whole blood was obtained from TEDDY children for whole genome sequencing. The WGS data were generated on the Illumina HiSeq X Ten system.
Acral melanoma, which is not ultraviolet (UV)-associated, is the most common type of melanoma in several low- and middle-income countries including Mexico. Latin American samples are significantly underrepresented in global cancer genomics studies, which directly affects patients in these regions as it is known that cancer risk and incidence may be influenced by ancestry and environmental exposures. To address this, we characterise the genome and transcriptome of 123 acral melanoma tumours from 92 Mexican patients, a population notable because of its genetic admixture. Compared with other studies of melanoma, we found fewer frequent mutations in classical driver genes such as BRAF, NRAS or NF1. While most patients had predominantly Amerindian genetic ancestry, those with higher European ancestry had increased frequency of BRAF mutations and a lower median number of structural variants. The tumours with activating BRAF mutations have a transcriptional profile more similar to cutaneous non-volar melanocytes, suggesting that acral melanomas in these patients may arise from a distinct cell of origin compared to other tumours arising in these locations. KIT mutations were found in a subset of these tumours, and quadruple wild-type samples (non BRAF/NRAS/NF1/KIT) differed from mutated samples in their structural genomic profile and overall and recurrence-free survival patterns. Transcriptional profiling defined three expression clusters; these characteristics were associated with recurrence-free and overall survival. We highlight potential novel low-frequency drivers, such as PTPRJ, NF2 and RDH5. Our study enhances knowledge of this understudied disease and underscores the importance of including samples from diverse ancestries in cancer genomics studies.
A diagnostic non-invasive biomarker test for prostate cancer at an early stage, with high sensitivity and specificity, would improve diagnostic decision making. Extracellular RNAs present in seminal plasma might contain biomarker potential for the accurate detection of clinically significant prostate cancer. So far, the extracellular messenger RNA (mRNA) profile of seminal plasma has not been interrogated for its biomarker potential in the context of prostate cancer. Here, we investigate the mRNA transcriptome in seminal plasma samples obtained from prostate cancer patients (n=25), patients with benign prostate hyperplasia (n=26) and individuals without prostatic disease (n=6). Seminal plasma harbors a complex mRNA repertoire that reflects prostate as its tissue of origin. The endogenous RNA content is higher in the prostate cancer samples compared to the control samples. Prostate cancer antigen 3 (PCA3), a long non-coding RNA with prostate cancer-specific overexpression, and ATP-binding cassette transporter 1 (ABCA1), known to be involved in the prostate cancer pathogenesis, were more abundant in the prostate cancer group. In addition, twelve high confidence fusion transcripts could be detected in prostate cancer samples, including the bona-fide prostate cancer fusion transcript TMPRSS2-ERG. Our findings provide proof-of-principle that the extracellular transcriptome of seminal plasma can reveal information of an underlying prostate cancer.
RNA-seq data from ILC samples. This dataset includes counts normalized for sequencing depth and gene length.
Single cell RNA-seq for primary samples
Fastq files for PACA-CA RNA Seq analysis, for DCC release 27
Sample metadata for the RNA-seq dataset. This dataset includes subject-level data and longitudinal visit day information for the corresponding samples.
The Mutographs project aims to advance our understanding of the causes of cancer through studies of mutational signatures. Led by Mike Stratton, together with Paul Brennan, Ludmil Alexandrov, Allan Balmain, David Phillips and Peter Campbell, this large-scale international research endeavour was awarded a Cancer Research UK Grand Challenge. Within Mutographs, work lead by the Sanger Institute will investigate whether detection of somatic mutations and mutational signatures in circulating white blood cells can be developed into a practical, generic system for surveying and monitoring multiple different endogenous and exogenous exposures, providing an ‘observatory’ on somatic mutational processes in humans. Whole genome sequences are generated at the Wellcome Sanger Institute (Illumina HiSeqX). Somatic mutational signatures are subsequently extracted by non-negative matrix factorisation methods. Through an enhanced understanding of cancer aetiology, Mutographs unprecedented effort is anticipated to outline modifiable risk factors, lead to new approaches to prevent cancer, and provide opportunities to empower early detection, refine high-risk groups and contribute to further therapeutic development.
We aim to identify molecular subtypes of prostate cancer using consensus non-negative matrix factorization and correlate these with existing biomarkers to inform future immunotherapeutic strategies.