Dataset

CINECA synthetic cohort EUROPE UK1 referencing fake samples

Dataset ID Technology Samples
EGAD00001006673 Illumina HiSeq 2000 2504

Dataset Description

Please note: This synthetic data set (with cohort “participants” / ”subjects” marked with FAKE) has no identifiable data and cannot be used to make any inference about cohort data or results. The purpose of this dataset is to aid development of technical implementations for cohort data discovery, harmonization, access, and federated analysis. In support of FAIRness in data sharing, this dataset is made freely available under the Creative Commons Licence (CC-BY). Please ensure this preamble is included with this dataset and that the CINECA project (funding: EC H2020 grant 825775) is acknowledged. For any questions please contact isuru@ebi.ac.uk or cthomas@ebi.ac.uk

This dataset (CINECA_synthetic_cohort_EUROPE_UK1) consists of 2521 samples which have genetic data based on 1000 Genomes data (https://www.nature.com/articles/nature15393), and synthetic subject attributes and phenotypic data derived from UKBiobank (https://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.1001779). These data were initially derived using the TOFU tool (https://github.com/spiros/tofu), which generates randomly generated values based on the UKBiobank data dictionary. Categorical values were randomly generated based on the data dictionary, continuous variables generated based on the distribution of values reported by the UK Biobank showcase, and date / time values were random. Additionally we split the phenotypes and attributes into 4 main classes - general, cancer, diabetes mellitus, and cardiac. We assigned the general attributes to all the samples, and the cardiac / diabetes mellitus / cancer attributes to a proportion of the total samples. Once the initial ... (Show More)

Who controls access to this dataset

For each dataset that requires controlled access, there is a corresponding Data Access Committee (DAC) who determine access permissions. Access to actual data files is not managed by the EGA. If you need to request access to this data set, please contact:

EGA Data Access Committee for synthetic and open-access test data held at EGA.
Contact person: EGA Helpdesk
Email: helpdesk [at] ega-archive [dot] org
More details: EGAC00001000908

Downloads

You don't have access to the download section.