Thyroid cancer is the most common endocrine malignancy. This dataset encompasses two types of thyroid cancer : anaplastic which is the most de-differentiated and aggressive one, and papillary which is the most common one. We profiled 14 patients, including 10 papillary and 4 anaplastic thyroid carcinomas, using both single nuclei RNA sequencing and spatial transcriptomics to link single cell resolution RNA sequencing with tissue morphology and better understand inter and intratumoral thyroid cancer heterogeneity.
The Electronic Medical Record Phenotypes and Community-engaged Genomic Association Study aims to identify genetic variants mediating susceptibility to peripheral arterial disease (PAD). The study leverages a biorepository of blood samples of 1688 PAD cases and 1649 controls, and the electronic medical record (EMR) to annotate the biorepository. PAD cases were identified from the vascular laboratory database as having an ankle brachial index (ABI) <0.9 at rest or after exercise or having non-compressible vessels. Controls were without prior history of atherosclerotic vascular disease and when tested, no evidence of ischemia on a stress test. Phenotypes and environmental exposures including age, ethnicity, demographic and anthropometric data are derived from the Mayo Electronic Medical Record (EMR). Comorbidities were determined using algorithms for diabetes and hypertension based on ICD-9 codes and medication use. Relevant laboratory data, including lipid levels, fasting blood sugar and serum creatinine at index date or within a 1-year window of the index date were extracted. Medication classes at index date were identified using Mayo's Natural Language Processing-based system with RxNorm codification and NDF-RT terminologies mapping. Smoking status was confirmed by natural language processing of clinical notes. Genotyping of ~600,000 SNPs across the genome is being conducted at the Broad Institute using the Illumina 660W platform. Statistical analyses will be conducted to identify genetic variants associated with susceptibility to PAD.
The associated data of this study is derived from a single female patient with pulmonary sclerosing pneumocytoma (PSP). The aim of this study is to provide the most comprehensive, multi-modality sequencing study of a single case of PSP to-date. To this end, the following sequencing experiments were performed: i) RNA-Seq of the primary tumor and adjacent normal tissues (6 replicates for each tissue type), which was intended to analyze RNA-Seq fusions, gene expression, expression mutations, ii) low-pass DNA whole genome sequencing (WGS) from primary tumor tissue and germline, which was intended to analyze copy number aberrations, and iii) DNA targeted panel sequencing of lung cancer associated genes, using primary tumor tissue and germline from white blood cells, which was intended to analyze somatic mutations. Principal findings: i) the PSP hallmark mutation AKT1 (p.E17K) was detected within both the DNA and RNA, and ii) the TP53 signaling pathway was found to be statistically significant by three different pathway analysis tools of analyzing gene expression and pathway ramifications. Among proteins within the TP53 signaling pathway, the p53 inhibitor encoded by MDM2 was found to be overexpressed (by differential gene expression analysis). The original sequencing data (i.e., FastQ files) from each of the aforementioned samples will be accessible through dbGaP.
The raw fastq files target sequencing of 112 genes for 1,298 endometrial glands and matched blood samples. The paired-end sequencing data sets (R1 and R2) are deposited. ABCC1, ACRC, ANK3, ARHGAP35, ARID1A, ARID5B, ATCAY, ATM, ATR, BARD1, BCOR, BRCA1, BRCA2, BRD4, BRIP1, CAMTA1, CDC23, CDYL, CFAP54, CHD4, CHEK1, CHEK2, CTCF, CTNNB1, CUX1, DGKA, DISP2, DYNC2H1, EMSY, FAAP24, FAM135B, FAM175A, FAM65C, FANCA, FANCB, FANCC, FANCD2, FANCE, FANCF, FANCG, FANCI, FANCL, FANCM, FAT1, FAT3, FBN2, FBXW7, FGFR2, FRG1, GPR50, HEATR1, HIST1H4B, HNRNPCL1, HOOK3, KIAA1109, KIF26A, KMT2B, KMT2C, KRAS, LAMA2, LRP1B, MLH1, MON2, MRE11A, MSH2, MSH6, MTOR, NBN, PALB2, PHEX, PIK3CA, PIK3R1, PLXNB2, PLXND1, PMS2, POLE, POLR3B, PPP2R1A, PTEN, PTPN13, RAD50, RAD51, RAD51B, RAD51C, RAD51D, RAD52, RAD54B, RAD54L, RICTOR, SACS, SIGLEC9, SLC19A1, SLX4, SPEG, STT3A, TAF1, TAF2, TAS2R31, TFAP2C, TNC, TONSL, TP53, TTC6, UBA7, VNN1, WT1, XIRP2, ZBED6, ZC3H13, ZFHX3, ZFHX4, ZMYM4.
Although composition and functional potential of the human gut microbiota evolve over lifespan, kinship has been identified as a key covariate of microbial community diversification. To date, sharing of microbiota features within families has however mostly been assessed between parents and their direct offspring. Here, we investigate potential transmission and persistence of familial microbiome patterns and microbial genotypes in a family cohort (N=102) spanning three to five generations over the same female bloodline. We observe microbiome community composition to be associated with kinship, with seven (low-abundant) genera displaying familial distribution patterns. While kinship and current cohabitation emerged as closely entangled variables, our explorative analyses of microbial genotype distribution and transmission estimates point at the latter as a key covariate of strain dissemination. Highest potential transmission rates are estimated between sisters and mother-daughter pairs, decreasing with increasing daughter’s age, and being higher among cohabiting pairs than those living apart. Although rare, we do detect potential transmission events spanning three and four generations, primarily involving species of the genera Alistipes and Bacteroides. Overall, while our analyses confirm the existence of family-bound microbiome community profiles, transmission or co-acquisition of bacterial strains appears to be strongly linked to cohabitation.
Targeted sequencing was applied to an unselected population-based diffuse large B-cell lymphoma cohort (n=928) diagnosed in the UK's Haematological Malignancy Research Network catchment population of ~4 million (14 centres). DNA extracted from tumour samples was sequenced with a 293-gene panel using the Illumina HiSeq 2500. All data are provided in the CRAM format.
This dataset contains whole blood transcriptome data generated from 93 patients with COVID-19 across a range of severities and 23 healthy controls. All patients were PCR positive for SARS-CoV-2 and disease severity ranged from asymptomatic to severe disease requiring ventilation. Individuals without symptoms, or with mild symptoms, were recruited from routine screening of healthcare workers, while COVID-19 patients were recruited at or soon after admission to Addenbrooke’s or Royal Papworth hospitals. Blood samples were taken at recruitment and then again four weeks later. Further details of the cohort and the generation of the RNA-Sequencing data can be obtained from Bergamaschi, L. et al. Longitudinal analysis reveals that delayed bystander CD8+ T cell activation and early immune pathology distinguish severe COVID-19 from mild disease. Immunity 54, 1257-1275 e8 (2021).
Sequencing of tissue samples and their derived organoids from oesophageal, pancreatic and colorectal cancer patients. . This dataset contains all the data available for this study on 2023-06-22.