EGAD00000000028 |
Aggregate results from a GWAS study on 3352 cases abd 3145 controls |
unknown |
6497 |
EGAD00000000029 |
Aggregate results from a case-control study on stroke and ischemic stroke. |
|
19602 |
EGAD00000000058 |
Aggregate results from 22 Carbamazepine-induced hypersensitivity syndrome patients and 2691 UK National Blood Service (NBS) control samples |
unknown |
2713 |
EGAD00000000059 |
Aggregate results from 43 Carbamazepine-induced hypersensitivity syndrome patients and 1296 1958 British Birth Cohort control samples |
unknown |
1 |
EGAD00000000115 |
Summary data from GWAS analysis on 856 cases and 2836 control |
unknown |
3719 |
EGAD00001001626 |
RNA-Seq Illumina GAII dataset for the TraIT cell-line use case (added reverse and forward reads). |
Illumina Genome Analyzer II |
6 |
EGAD00001002069 |
Complete genomics data for VCaP and PC346c. |
|
2 |
EGAD00001002071 |
qDNAseq shallow sequencing dataset of the cell line use case. |
|
5 |
EGAD00001002109 |
TSACP TruSeq Amplicon Panel dataset for the TraIT cell line use case |
|
5 |
EGAD00001002250 |
mRNA-Seq, HiSeq 2000 dataset of the Cell-line use case |
Illumina HiSeq 2000 |
1 |
EGAD00001003338 |
This is a test dataset derived from public data of the 1000 Genomes Project. Its purpose is not to allow for any inference about cohort data or results, but to aid bioinformaticians in the technical development and testing of tools, as well as data consumers in learning how to access information.
This dataset consists of 2508 samples from the 1000 Genomes Project (https://www.nature.com/articles/nature15393). Samples' (e.g. NA18534) data can be accessed through the IGSR portal (e.g. https://www.internationalgenome.org/data-portal/sample/NA18534) or their corresponding folder at the 1000 Genomes' FTP site (e.g. http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/data/CHB/NA18534/exome_alignment/).
There are several different types of data this dataset encompasses: Variant Calling Format (VCF, or its binary counterparts BCF) files, both joint (e.g. ALL_chr22_20130502_2504Individuals.vcf.gz) and split (HG01775.chrY.vcf.gz); exome sequencing CRAM files (e.g. NA18534.GRCh38DH.exome.cram); and whole genome sequencing CRAM/BAM files (e.g. NA19239.cram). These files can be downloaded directly through the EGA-download-client PyEGA3 (https://github.com/EGA-archive/ega-download-client).
For any further questions, please contact the DAC (Helpdesk - email: helpdesk [at] ega-archive [dot] org). |
AB SOLiD 4 System |
2508 |
EGAD00001003971 |
ICGC-TCGA DREAM Somatic Mutation Calling - Tumour Heterogeneity Challenge - WGS mapped reads |
|
59 |
EGAD00001005747 |
RNAseq sample used in study titled "Immune-awakening revealed by peripheral T cell dynamics after one cycle of immunotherapy". |
Illumina HiSeq 2500 |
1 |
EGAD00001006673 |
Please note: This synthetic data set (with cohort “participants” / ”subjects” marked with FAKE) has no identifiable data and cannot be used to make any inference about cohort data or results. The purpose of this dataset is to aid development of technical implementations for cohort data discovery, harmonization, access, and federated analysis. In support of FAIRness in data sharing, this dataset is made freely available under the Creative Commons Licence (CC-BY). Please ensure this preamble is included with this dataset and that the CINECA project (funding: EC H2020 grant 825775) is acknowledged. For any questions please contact isuru@ebi.ac.uk or cthomas@ebi.ac.uk
This dataset (CINECA_synthetic_cohort_EUROPE_UK1) consists of 2521 samples which have genetic data based on 1000 Genomes data (https://www.nature.com/articles/nature15393), and synthetic subject attributes and phenotypic data derived from UKBiobank (https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001779). These data were initially derived using the TOFU tool (https://github.com/spiros/tofu), which generates randomly generated values based on the UKBiobank data dictionary. Categorical values were randomly generated based on the data dictionary, continuous variables generated based on the distribution of values reported by the UK Biobank showcase, and date / time values were random. Additionally we split the phenotypes and attributes into 4 main classes - general, cancer, diabetes mellitus, and cardiac. We assigned the general attributes to all the samples, and the cardiac / diabetes mellitus / cancer attributes to a proportion of the total samples. Once the initial set of phenotypes and attributes were generated, the data data was checked for consistency and where possible dependent attributes were calculated from the independent variables generated by TOFU. For example, BMI was calculated from height and weight data, and age at death generated by date of death and date of birth. These data were then loaded to the development instance of Biosamples (https://www.ebi.ac.uk/biosamples/) which accessioned each of the samples.
The genetic data are derived from the 1000 Genomes Phase 3 release (https://www.internationalgenome.org/category/phase-3/). The genotype data consists of a single joint call vcf files with call genotypes for all 2504 samples, plus bed, bim, fam, and nosex files generated via plink for these samples and genotypes. The genotype data has had a variety of errors introduced to mimic real data and as a test for quality control pipelines. These include gender mismatches, ethnic background mislabelling and low call rates for a randomly chosen subset of sample data as well as deviations from Hardy Weinberg equilibrium and low call rates for a random selection of variants. Additionally 40 samples have raw genetic data available in the form of both bam and cram files, including unmapped data. The gender of the samples in the 1000 genomes data has been matched to the synthetic phenotypic data generated for these samples. The genetic data was then linked to the synthetic data in BioSamples, and submitted to EGA. |
Illumina HiSeq 2000 |
2504 |
EGAD00001008095 |
This dataset contains whole genome sequencing data, based in BAM files of three trio members. These BAM files contain information of chromsomes 21, X, Y and mitochondrial. |
|
3 |
EGAD00001008096 |
This dataset contains whole genome sequencing data, based in paired end Fastq files of three trio members. |
Illumina HiSeq 2500 |
3 |
EGAD00001008097 |
This dataset contains whole genome sequencing data, based in VCF of three trio members. |
|
3 |
EGAD00001008392 |
The purpose of this project is to provide public human datasets for the study of rare diseases. The use of public human genomic background combined with the in-silico insertion of real disease-causing variants enable to have a representative dataset for testing purposes without facing ethical and legal issues associated with the use of human sensitive data. This project aims to help development of technical implementations for rare disease data integration, analysis, discovery, and federated access. |
Illumina HiSeq 2000 |
18 |
EGAD00010000300 |
Summary statistics from Haemgen RBC GWAS |
Illumina,Affymetrix,Perlegen |
1 |
EGAD00010000434 |
Normalised mRNA expression |
Illumina HT 12 |
1302 |
EGAD00010000438 |
Normalized miRNA expression data |
Agilent ncRNA 60k |
1480 |
EGAD00010000440 |
Segmented copy number data |
Affymetrix_SNP6_raw |
1302 |
EGAD00010000444 |
Agilent ncRNA 60k txt files |
Agilent ncRNA 60k |
1480 |
EGAD00010000528 |
Illumina HumanHT-12 v4 array |
|
0 |
EGAD00010000934 |
Agilent miRNA dataset |
Agilent SurePrint Human miRNA Microarray |
2 |
EGAD00010000935 |
ACGH 244K dataset |
Agilent 244K |
10 |
EGAD00010000936 |
Affymetrix Exon Array dataset |
Affymetrix GeneChip Human Exon 1.0 ST |
2 |
EGAD00010000937 |
ACGH 180K dataset |
Agilent 180K |
5 |
EGAD00010000938 |
mRNA Array Agilent 44K dataset |
Agilent 44K |
16 |
EGAD00010000939 |
Illumina 1M SNP Array dataset |
Illumina 1M SNP Array |
2 |
EGAD00010001006 |
Proteomics LC-MS MS dataset |
Liquid chromatography–mass spectrometry |
8 |
EGAD00010001029 |
Summary statistics for a multi-cohort epigenome-wide association study. This includes summary statistics (effect-size, standard error, p-value) for 470,000 methylation markers. |
|
0 |