Synthetic Data

Synthetic data, as the name suggests, is data that is artificially created rather than being generated by actual events. It is often created with the help of algorithms and can be utilized for multiple usages in biomedical research such as test data for new products and tools, model validation, AI model training, etc. One of the limitations in the genomics research field is that human genomics data is not readily available due to concerns about violations of individual privacy, and therefore controlled access to the distribution of such data is crucial. However, to facilitate research. In such a scenario, synthetic or fake data provides a rich resource for data mining and integration towards the advancement of genomics and biomedical studies.

Here, we provide easy access to synthetic datasets augmented with rich synthetic metadata, that overcomes real data usage restrictions, such as usage constraints due to privacy rules or other regulations. We encourage other data providers and research groups to upload their synthetic datasets in EGA or contact the EGA Helpdesk for further assistance. Access to synthetic data studies is managed by the EGA helpdesk data access committee.


  Study ID   Title
EGAS00001002472 CINECA synthetic cohort EUROPE UK1 referencing fake samples
EGAS00001005591 Synthetic data - Genome in a Bottle
EGAS00001005702 Human genomic and phenotypic synthetic data for the study of rare diseases