Males and females show dramatic differences in their vulnerability to the same diseases. For example, compared to men, lupus is six times more prevalent, thyroid cancer is three times more prevalent, and unipolar depression is twice as prevalent in women. Diseases with a strong male bias include autism (5:1), dilated cardiomyopathy (3:1), and ankylosing spondylitis (5:1). Historically, such differences have been attributed solely to extrinsic factors such as circulating sex hormones or environmental influences. We hypothesized that intrinsic factors - genetic differences between XX and XY cells - have unappreciated biological consequences throughout the body and contribute to sex differences in disease incidence and severity. This hypothesis stems from our long-term effort to sequence the sex chromosomes of diverse mammalian species, which has identified a set of homologous genes on the X and Y chromosomes that are dosage-sensitive, expressed throughout the body, and encode regulators of chromatin modification, transcription, translation, and protein stability. These X- and Y-encoded genes differ in sequence and expression pattern, which likely manifests in genome-wide differences in gene regulation between XX and XY cells and influences all aspects of human biology, including sex differences in disease susceptibility. These hard-wired molecular sex differences have been largely overlooked and understudied, representing a significant gap in our knowledge of human biology.The gene expression study of individuals with sex chromosome aneuploidies takes advantage of natural human variation in sex chromosome number, i.e. sex chromosome aneuploidy, to investigate alterations in genome-wide gene expression that correlate with changes in X- and Y-chromosome dosage. We analyzed samples from 114 individuals with a variety of sex chromosome aneuploidies, including 45,X; 47,XXY; 47,XYY; 47,XXX; 48,XXYY; and 49,XXXXY. We generated lymphoblastoid cell lines (LCLs) from blood samples and, in some cases, fibroblast cultures from skin biopsies. We supplemented our collection with previously-derived cell lines. To evaluate gene expression, we performed deep profiling of the transcriptome (RNA-seq) from these LCLs and fibroblasts. We performed parallel analyses on samples collected from 62 control 46,XX and 46,XY individuals, 6 individuals with trisomy 21, and 14 individuals with structural variations of the X and Y chromosomes. In addition, we performed CRISPRi knockdowns on 3 of the 46,XX and 3 of the 46,XY fibroblast samples for the homologous transcription factors ZFX and ZFY, encoded on the X and Y chromosomes, respectively.In the April 2024 update, we added RNA-seq datasets derived from isolated CD4+ T cells and monocytes from 76 and 72 adults, respectively, with the following sex chromosome constitutions: 45,X; 46,XX; 46,XY; 47,XXX; 47,XXY; 47,XYY. These individuals are largely a subset of the same cohort described above. In addition, we performed RNA-seq on in-vitro stimulated CD4+ T cells with the following sex chromosome constitutions: 45,X; 46,XX; 46,XY; 47,XXY.In the August 2024 update, we added RNA-seq datasets generated from the following: 1) LCLs derived from individuals with AZFa deletions of the Y chromosome, 2) DDX3X and DDX3Y knockdown (via CRISPRi) in XY fibroblasts, and 3) 5-ethyl uridine (5-EU) treatment in XY and XYYYY LCLs.
Calcific aortic valve stenosis (CAVS) is a common and life-threatening heart disease with no drug that can stop or delay its progression. A genome-wide association study (GWAS) on 1,009 cases and 1,017 ethnically-matched controls was performed to identify susceptibility genes for CAVS.
Further investigation and characterisation of 12q-amplified low- and high-grade osteosarcomas with MDM2 and/or CDK4 amplification focusing on SV, copy number and gene fusion analyses. In total, 25 cases (33 samples total due to multi-sampling) were included, with some form of sequencing data available for 27 samples. Mate-pair whole genome sequencing (Illumina) is available for 19 samples, longread whole genome sequencing (PacBio HiFi) on 10 samples and RNA-sequencing (Illumina Truseq) on 21 samples. Data is available as BAM files.
The dataset, in .bam file format, consists of whole exome sequencing (WES) data of tumors from patients (n=24) with Renal Medullary Carcinoma (RMC), generated using Illumina NovaSeq 6000 sequencing technology. DNA was extracted from FFPE solid tumor samples using the AllPrep DNA FFPE Kit (Qiagen, CA). Libraries from FFPE tissue were prepared with the SureSelect XT HS2 DNA Kit (Agilent, CA) for exome capture. The All Exon V7 exome probe set (Agilent, CA) was used for hybridization and capture of DNA.
The dataset contains RNA-sequencing data (bam files) for 82 pleural mesothelioma samples. All the samples were sequenced on an Illumina NovaSeq 600 sequencer in paired-end mode, generating 100 nt length reads, to obtain an average of 60 million clusters for RNA. Demultiplexing was performed using Illumina bcl2fastq2. Fastq quality was assessed using fastQC (v. 0.11.8) (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and low-quality reads were discarded. Sequence reads were aligned to Human reference genome (UCSC genome assembly GRCh38/hg38) using STAR (v. 2.7.0b).
BRAF V600E colorectal cancers do not respond to the only currently FDA approved targeted therapy for CRC. There is currently a trial underway in the UK recruiting V600E CRC patients for treatment with a triple therapy combination of Cetuximab, Trametinib and Dabrafenib. We have mutagenized a pool of V600E CRC cell lines and treated with this triple therapy to select out drug resistant clones. We will now sequence these drug resistant clones with the aim of identifying common point mutations engendering resistance to this new therapy.
12 tissues from the warm autopsy are selected for this project. Using 10X Chromium technology we will generate ~1000 single cell/nulei genomic libraries per tissue. Each tissue will be whole genome sequenced (~2 lanes per 1000 cells) on hiseq X10. per single cell we will generate CNV profile and we investigate the level of genomic heterogenity with in tissue and across different tissues. . This dataset contains all the data available for this study on 2019-10-02.
Single-cell whole transcriptome and antibody expression for bone marrow samples from Cohorts A and B. CITEseq protocol was followed. 37 and 77 surface markers were measured in each cohort, respectively (see Supplementary Table 1). For details on cell sorting prior scRNAseq see the methods section of the manuscript.
WGS sequencing for 409 cases (832 samples) from the ICGC ESAD-UK project Tumours 50x Normals 30x HiSeq X BAM files These samples are all available in ICGC release 28
Paired-end WGS data of 10 neuroblastoma patient samples (5 obtained at diagnosis and 5 matched blood samples as controls) used for analysis of telomeric content and sequence composition. Mean coverage is 11-65x per sample. The remaining patient samples of the dataset can be found under accession numbers EGAS00001001308 and EGAS00001005424 and mappings of the patients IDs in the supplementary material of the publication.