Colorectal cancers (CRCs) have been categorized based on histopathological characteristics or genomic abnormalities, such as gene expression. These classifications have gained widespread acceptance and application in clinical practice. It has become evident in recent times that there is a connection between gut bacteria and CRC. However, the extent to which the diversity of CRC and the gut microbiota are linked has not been extensively investigated. We conducted a comprehensive analysis that integrated whole genome and whole transcriptome data of CRC tissues, along with stool-derived whole genome metagenome analysis. Our approach to accurately stratify CRC involved the utilization of explainable AI, which allowed us to classify the disease into four distinct subtypes based on the composition of gut microbiota.
Bulk RNA-sequencing of peripheral blood mononuclear cells collected from mild COVID-19 (n = 4), critical COVID-19 (n = 12) and control (n = 8) patients. Sequencing data is provided as FASTQ files.
Five edited and two unedited organoid clones with one clone prior to editing were paired-end whole genome sequenced using Illumina Novaseq 6000 system. The reads were mapped to hg38 genome assembly and data is provided as BAM files.
Excess sugar consumption is common among youth and can have adverse health effects. However, the relationship between saliva microbiota and sugar consumption remains sparsely studied. We aimed to explore diversity, composition and functional capacities of saliva microbiota in 11–13-year-old Finnish children with low and high sweet treat consumption.
RNA sequencing was performed on 15 T-LGLL patients and five control samples. The raw data is provided as fastq files.
Three SpCas9-ABE (R785X/R785X) and three xCas9-ABE-repaired organoid clones (F508del/R553X) and their respective unrepaired control organoids were paired-end whole genome sequenced using Illumina Novaseq 6000 system. The reads were mapped to hg19 genome assembly and data is provided as BAM files.
This dataset represents two combined study populations. Serrated Colorectal Cancer: An Emerging Disease Subtype (called the Advanced Colorectal Cancer of Serrated Subtype Study or ACCESS Study) was a grant awarded to investigate a newly-recognized, biologically-distinct subtype of colorectal cancer (CRC) called “serrated CRC.” The objective of this project was to characterize factors related to the genetic predisposition, clinical presentation, and prognosis of serrated CRC. The study recruited incident invasive CRC cases diagnosed between April 2016 and December 2018, aged 20-74 years at diagnosis. Cases were identified through the Surveillance, Epidemiology and End Results (SEER) cancer registry serving 13 counties in western Washington State. Eligibility for all individuals was limited to those who were English-speaking and could consent. Participation included completing a baseline epidemiologic questionnaire shortly after diagnosis, optional donation of a saliva sample for genetic analysis, and optional consent to release of medical records and tissue specimens related to their diagnosis. Tumor specimens were tested for serrated CRC-defining molecular characteristics. Further, we have vital status on all participants and cause of death on those that have died since enrollment. Hormones and Colon Cancer: Epigenetic Subtypes, Risks, and Survival (called the Post-Menopausal Hormones Study or PMH Study) was a grant awarded to investigate the impact of post-menopausal hormone use on colon cancer risk, tumor molecular characteristics, and outcomes. Eligible cases were females, newly diagnosed with invasive colorectal adenocarcinoma between October 1998 and February 2002, aged 50 to 74 years. Cases were residents of 10 out of the 13 counties in western Washington State served by the Surveillance, Epidemiology and End Results (SEER) cancer registry. Eligibility for all individuals was limited to those who were English-speaking with available telephone numbers, in which they could be contacted. Unrelated population-based controls were randomly selected according to age distribution (in 5-year age intervals) of the eligible cases by using lists of licensed drivers from the Washington State Department of Licensing (for individuals aged 50 to 64 years) and rosters from the Health Care Financing Administration (now the Centers for Medicare and Medicaid, for individuals older than 64 years). Participation included completing a baseline epidemiologic questionnaire, optional donation of a saliva sample for genetic analysis, and (for cases only) optional consent to release of medical records and tissue specimens related to their diagnosis. Tumor specimens were tested for epigenetic and other molecular characteristics. The ACCESS study was supported by funding from the National Cancer Institute of the National Institutes of Health (NCI/NIH) (R01CA196337, PI: Newcomb, PA), as was the PMH Study (R01CA076366, PI: Newcomb, PA). Additional support for the PMH Study came from the Seattle site of the Colon Cancer Family Registry (SCCFR) (U01CA167551, PI: Jenkins, M, and U01/U24CA074794, PI: Newcomb, PA). Additional support for case ascertainment was provided by the Cancer Surveillance System of the Fred Hutchinson Cancer Center, which is funded by Contract Number HHSN261201300012I; NCI Control Number: N01 PC-2013-00012; Contract Number HHSN261201800004I; and NCI Control Number: N01 PC-2018-00004 from the Surveillance, Epidemiology and End Results (SEER) Program of the National Cancer Institute with additional support from the Fred Hutchinson Cancer Center and the State of Washington. This research was also supported by the Genomics and Bioinformatics, Comparative Medicine, Specialized Pathology, Collaborative Data Services, and Experimental Histopathology Shared Resources of the Fred Hutch/University of Washington Cancer Consortium (P30 CA015704).Tumor marker testing was performed using formalin-fixed paraffin-embedded diagnostic tumor tissue specimens, and DNA extracted from those specimens. Testing for microsatellite instability (MSI) was based on either a 10-gene panel (BAT25, BAT26, BAT40, MYCL, D5S346, D17S250, ACTC, D18S55, D10S197, BAT34C4) or a 4-marker immunohistochemistry panel of DNA mismatch repair proteins (MLH1, MSH2, MSH6, PMS2). CpG island methylator phenotype (CIMP) testing was based on a validated quantitative DNA methylation assay using a five-gene panel (CACNA1G, IGF2, NEUROG1, RUNX3, SOCS1) or eight-gene panel (CACNA1G, IGF2, NEUROG1, RUNX3, SOCS1, MLH1, CRABP1, CDKN2A). Somatic p.V600E BRAF mutation status was tested for using a fluorescent allele-specific PCR assay. KRAS mutations in codons 12 and 13 were also assessed through forward and reverse sequencing of amplified tumor DNA. DNA was extracted from blood/saliva samples using conventional methods. The genotyping panel completed was the Build37 OncoArray500K-C, including 1%-6% blinded duplicates to monitor the quality of the genotyping. Quality control procedures were performed to 1) make sure that there were no patterns of missing data by batch, study, or plate, 2) check for gender discrepancies and kinship, 3) complete Principal Component Analysis, and 4) test for Hardy-Weinberg equilibrium (HWE). Samples were excluded based on call rate, heterozygosity, unexpected duplicates, gender discrepancy, and unexpectedly high identity-by-descent or unexpected genotypic concordance (>65%) with another individual. In addition, variants were excluded based on call rate (98%), lack of HWE in controls (P
Adult human somatic tissues are landscapes of mutant clones, but little is known about how these landscapes are shaped by known cancer risk factors. The goal of this study was to determine how the main bladder cancer risk factors (sex and smoking) change the mutational landscape of the normal urothelium. We collected normal urothelium from 45 donors at autopsy by brushing ~2cm² of the surface epithelium in the upper part (dome) and lower part (trigone) of the bladder (79 samples total). To identify mutations with high sensitivity we employed ultradeep (~5000x) DNA error-correcting duplex-sequencing on a panel of 15 protein coding genes and the TERT promoter, previously described to be under positive selection in normal urothelium or relevant to bladder cancer. We identified thousands of clonal driver mutations, which demonstrate pervasive positive selection in the normal urothelium. Males showed a significantly higher number of driver truncating mutations in RBM10, CDKN1A and ARID1A, despite similar rates of non-coding mutations, indicating stronger positive selection and clonal expansion in the male urothelium. We also discovered clonal expansions driven by activating TERT promoter mutations in the normal bladder, which were strongly associated with age and smoking. These results demonstrate that sex and smoking shape the clonal landscape of the normal bladder and indicate that the high mutational content of normal tissues might be leveraged to study the effect of mutations in vivo (i.e. natural saturation mutagenesis). We will deposit in dbGaP de-identified relevant clinical information from the 45 donors and the fastq files derived from duplex sequencing of the 79 samples in the study. We will also deposit fastq files derived from duplex sequencing of 3 control cord blood samples.
The iPSC Collection for Omic Research (iPSCORE) Resource was created as part of the Next-Gen Consortium funded by NHLBI. The overarching purpose of the resource is to provide a large collection of human induced pluripotent stem cells (iPSCs) for use in studying the impact of genetic variation on molecular and physiological phenotypes. The resource has been used in a number of studies in the Dr. Kelly A. Frazer's Lab examining both the characteristics of human iPSCs and a variety of iPSC-derived cell types including cardiovascular progenitor cells (iPSC-CVPCs), pancreatic precursor cells (iPSC-PPCs), and retina pigment epithelium cells (iPSC-RPEs). We have shown that the iPSCs, CVPCs, and PPCs are suitable surrogate models to identify genetic factors active in early developmental processes because they exhibit fetal-like molecular properties. A total of 273 individuals have participated in the study, of which 222 have had iPSCs generated from fibroblasts. Of the 273 individuals, 181 are part of 55 families that include 24 monozygotic twin pairs and 5 dizygotic twin pairs, allowing for the incorporation of familial relationships into genetic analyses. Germline DNA has been sequenced from blood or fibroblast samples for all 273 individuals (available through dbGaP phs001325) and other genomic data (RNA-seq, DNA methylation, genotype arrays, ATAC-seq, H3K27ac ChIP-seq, HiC-seq) have been generated from the 222 iPSCs as well as derived cell types (available through dbGaP phs000924). QTL analyses were conducted for multiple omics data types and summary statistics are available through dbGaP phs0001325. Important note: Of the 273 individuals, 268 are consented for general research use and 5 are consented for cardiac research only. For detailed information about iPSCORE Collection including samples, methods used to generate the data, and how to access the datasets visit the Frazer lab website.The 222 well characterized iPSC lines that constitute the iPSCORE resource are available, please contact Dr. Kelly A. Frazer (kafrazer@health.ucsd.edu) if you are interested in obtaining the collection.
Clinically pathogenic chromosomal microdeletions causing genetic disorders such as DiGeorge syndrome are rare genetic aberrations that can cause clinically relevant fetal and childhood developmental deficiencies. Clinical severity of such deficiencies depend on the exact genomic location and genes affected by the fetal chromosomal aberration. Here we present the BinDel, a novel region-aware microdeletion detection software package developed to infer clinically relevant microdeletion risk in low-coverage whole-genome sequencing NIPT data. To test BinDel, we quantified the impact of sequencing coverage, fetal DNA fraction, and region length on microdeletion risk detection accuracy. We also estimated BinDel accuracy on known microdeletion samples and clinically validated aneuploidy samples. BinDel identified each positive control sample as high risk. We also determined that it is critical to take into account that the sample with a detected high microdeletion risk does not have a full chromosome aneuploidy, as the latter can cause erroneous high microdeletion risk findings. We observed that lower sequencing coverage resulted in reduced microdeletion detection accuracy, and higher fetal fractions considerably increased the microdeletion detection accuracy, with coverage becoming increasingly relevant as fetal DNA fraction decreased. In conclusion, we developed an R package-based software tool BinDel for inferring fetal microdeletion risks, which accurately identified all positive control samples with microdeletion or -duplication aberrations as high-risk samples.