Need Help?

Human genomic and phenotypic synthetic data for the study of rare diseases

The purpose of this dataset is to facilitate development of technical implementations for rare disease data integration, analysis, discovery, and federated access. This synthetic dataset includes clinical and genomic data from 6 rare disease cases. It consists of 18 whole genomes (6 index cases with their parents) which have genetic background based on public human data sequenced in the context of the Illumina Platinum initiative (Eberle, MA et al. (2017)) and made available by the HapMap project ( In each of the cases, real causative variants correlating with the phenotypic data provided were spiked-in. The cases included in this synthetic dataset correspond to the following type of disorders: CASE 1- Congenital myasthenic syndrome (Autosomal Dominant -de novo variant) CASE 2- Macular dystrophy (Autosomal Dominant) CASE 3- Muscular dystrophy (Autosomal Recessive-compound heterozygous variants) CASE 4- Mitochondrial disorder (Autosomal Recessive-consanguineous case - homozygous variant) CASE 5- Breast cancer (Autosomal Dominant) CASE 6- Similar as case 1 for patient matchmaking tests: Congenital myasthenic syndrome (Autosomal Dominant-de novo variant) For each case you will be able to download the following data: clinical information (phenopackets per individual and pedigree per family), raw genomic data (FASTQ and BAMs) and processed genomic data (vcfs). When using the data, the following should be acknowledged: the RD-Connect GPAP (, EC H2020 project EJP-RD (grant # 825575), EC H2020 project B1MG (grant # 951724) and Generalitat de Catalunya VEIS project (grant # 001-P-001647).

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
EGAD00001008392 Illumina HiSeq 2000 18