DNA WGS Long Read Sequence (PromethION) for manuscript titled: "Performance of Somatic Structural Variant Calling in Lung Cancer using Oxford Nanopore Sequencing Technology"
Transcriptomic data for manuscript titled: Human proximal tubular epithelial cell interleukin-1 receptor signalling triggers cell cycle arrest during hypoxic kidney injury.
Data Quality Control High-throughput sequencing techniques have become the leading method to study, decode and discover the genomic origins of biological phenomenons. EGA provides a secure archival of such identifiable genomics data with the purpose of data-upcycling, i.e. to re-use these data for research. High-quality data standards are essential to ensure the quality and credibility of the research. Moreover, a quality check report can assure a researcher beforehand about the data that they will request access, therefore saving time and effort. The EGA has developed a File Quality Control Report (QC Report) to provide generic quality control reports for Fastq, SAM/BAM/CRAM, and VCF files deposited at EGA. This QC Report will allow users to get information regarding the files submitted within a specific dataset. The data requesters will obtain information such as the quality of reads, mapped reads, number of variants, and other features before starting the requesting process, which will save the efforts and time.Accessing file quality control reportsIn each dataset page, the user can explore the files that it contains by clicking the "files" tab. The Quality Control report of a file has two sections. The first one, contains general information about the file, such as the inferred assembly, total reads, the dataset or study where it comes from, etc. The second section contains plots that summarise interesting information about the file, for example, the base coverage distribution, base quality or mapped reads. The description of each plot is accessible by clicking the "i" button at the top-right corner of each plot box. Technical Description For analysing the fastq, SAM/BAM/CRAM and VCF files, the EGA applies a set of tools widely used in the bioinformatics community. FASTQ: FastQC, recognised as the gold standard tool by the community.Per base sequence quality, per sequence quality scores, per base sequence content, per sequence GC content, sequence duplication levels, etc.SAM/BAM/CRAM: samtools, also the gold standard, generates results plots useful to get an overall idea of the quality of the file.base coverage distribution, base quality, % of mapped reads, % of both mates mapped, singletons, duplicates, etc.VCF: vcftools and bcftools, combined with a custom script to infer the genome assembly.site frequency distribution, Ts/Tv, base changes, indel distribution, etc.
There is a need for quantitative measurements of evolutionary metrics in controlled clinical trials with long term follow-up information. This is particularly true in advanced localised prostate cancer, which can recur more than a decade after diagnosis. Here we mapped genomic intra-tumour heterogeneity in 642 tumour samples from 114 patients who took part in the IMRT and DELINEATE clinical trials, for which full clinical information and 12y median follow-up was available. We concomitantly assessed phenotypic (morphological) heterogeneity using Deep Learning in 1,923 histological sections from 250 IMRT patients (fully overlapping with the genetic set). This study shows that combining genomics with AI-aided histopathology in clinical trials leads to novel clinical biomarkers. This EGA repository contains data produced from tumour samples using low coverage whole genome sequencing and a prostate cancer specific gene panel data following compression of unique molecular identifiers.
The offspring of first cousin marriages have ~6% of their genome autozygous, i.e. homozygous identical by descent, or even more if there was further consanguinity in their ancestry. In the UK there are large populations with very high first cousin marriage rates of 20-50%. Sequencing the exomes of a sample of these individuals has the potential both to support genetic health programmes in these populations, and to provide genetic research information about rare loss of function mutations. This pilot study based on existing British-Pakistani cohort samples from Birmingham will identify homozygous individuals for almost all variants down to an allele frequency around 1%, plus individuals carrying hundreds of new homozygous rare loss-of-function variants, and will support development of community relations and ethics for a wider study currently being designed. The data deposited in the EGA consists of low coverage whole exome sequencing on these samples.
The offspring of first cousin marriages have ~6% of their genome autozygous, i.e. homozygous identical by descent, or even more if there was further consanguinity in their ancestry. In the UK there are large populations with very high first cousin marriage rates of 50-80%. Sequencing the exomes of a sample of these individuals has the potential both to support genetic health programmes in these populations, and to provide genetic research information about rare loss of function mutations. This pilot study based on existing British-Pakistani cohort samples from Birmingham will identify homozygous individuals for almost all variants down to an allele frequency around 1%, plus individuals carrying hundreds of new homozygous rare loss-of-function variants, and will support development of community relations and ethics for a wider study currently being designed. The data deposited in the EGA consist of low coverage whole exome sequencing on these samples.
This study is to ascertain whether it is feasible to extract a single cell form a Cancer cell line, amplify it and sequence it.
Genome-wide genotyping was performed on a population-based cohort from the capital region of Finland using the Illumina 610-Quad SNP microarray