Need Help?

Test dataset: Sequence and variant data from public 1000 Genomes Project

This is a test dataset derived from public data of the 1000 Genomes Project. Its purpose is not to allow for any inference about cohort data or results, but to aid bioinformaticians in the technical development and testing of tools, as well as data consumers in learning how to access information. This dataset consists of 2508 samples from the 1000 Genomes Project (https://www.nature.com/articles/nature15393). Samples' (e.g. NA18534) data can be accessed through the IGSR portal (e.g. https://www.internationalgenome.org/data-portal/sample/NA18534) or their corresponding folder at the 1000 Genomes' FTP site (e.g. http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/data/CHB/NA18534/exome_alignment/). There are several different types of data this dataset encompasses: Variant Calling Format (VCF, or its binary counterparts BCF) files, both joint (e.g. ALL_chr22_20130502_2504Individuals.vcf.gz) and split (HG01775.chrY.vcf.gz); exome sequencing CRAM files (e.g. NA18534.GRCh38DH.exome.cram); whole genome sequencing CRAM/BAM files (e.g. NA19239.cram). Additionally, there are multiple files that were sliced to create shorter files, which allows for a quick download, formated as "{FILE-INFO}__{NUMBER-OF-READS}r__{CHR}.{START-COORDINATE}-{END-COORDINATE}.{FILETYPE}" (e.g. "HG01500.GRCh38DH__90r__3.10000-10500__4.10000-10500.cram"). These files can be downloaded directly through the EGA-download-client PyEGA3 (https://github.com/EGA-archive/ega-download-client).

Request Access

EGA Test Policy

...

Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.

Study ID Study Title Study Type
EGAS00001002472 Other
EGAS00001005042 Whole Genome Sequencing
EGAS00001006718 Other

This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.

ID File Type Size Located in
EGAF00001753734 cram 45.1 GB
EGAF00001753735 crai 1.6 MB
EGAF00001753736 cram 38.2 GB
EGAF00001753737 crai 1.3 MB
EGAF00001753738 cram 38.4 GB
EGAF00001753739 crai 1.3 MB
EGAF00001753740 cram 34.8 GB
EGAF00001753741 crai 1.2 MB
EGAF00001753742 cram 44.1 GB
EGAF00001753743 crai 1.5 MB
EGAF00001753744 cram 48.3 GB
EGAF00001753745 crai 1.6 MB
EGAF00001753746 bam 143.5 GB
EGAF00001753747 bai 9.0 MB
EGAF00001753748 bam 4.3 GB
EGAF00001753749 bai 9.2 MB
EGAF00001753750 bam 66.2 GB
EGAF00001753751 bai 9.2 MB
EGAF00001753752 bam 229.9 GB
EGAF00001753753 bai 9.4 MB
EGAF00001753754 bam 136.1 GB
EGAF00001753755 bai 9.0 MB
EGAF00001753756 bam 140.5 GB
EGAF00001753757 bai 9.0 MB
EGAF00001770106 bam 462.3 MB
EGAF00001770107 bam 3.6 GB
EGAF00001775034 bai 6.0 MB
EGAF00001775036 bai 4.8 MB
EGAF00005000662 vcf.gz 25.5 MB
EGAF00005000663 tbi 18.6 kB
EGAF00005000664 bcf 27.0 MB
EGAF00005000665 csi 14.5 kB
EGAF00005001623 vcf.gz 214.5 MB
EGAF00005001624 tbi 36.1 kB
EGAF00005001625 bcf 186.5 MB
EGAF00005001626 csi 27.6 kB
EGAF00005007180 cram 1.8 GB
EGAF00005007181 cram 2.9 GB
EGAF00005007323 vcf.gz 5.7 MB
EGAF00005007324 tbi 8.1 kB
EGAF00005007325 bcf 5.5 MB
EGAF00005007326 csi 6.3 kB
EGAF00005007327 vcf.gz 851.1 kB
EGAF00005007328 tbi 5.0 kB
EGAF00005007329 bcf 876.7 kB
EGAF00005007330 csi 4.7 kB
EGAF00005007331 crai 137.5 kB
EGAF00005007332 crai 229.4 kB
EGAF00007243773 bam 194.9 kB
EGAF00007243774 cram 135.2 kB
EGAF00007243775 vcf.gz 23.0 kB
EGAF00007243776 tbi 2.0 kB
EGAF00007243777 bam 122.5 kB
EGAF00007243778 cram 112.3 kB
EGAF00007243779 vcf.gz 15.4 kB
EGAF00007243780 tbi 2.0 kB
EGAF00007243781 bai 27.1 kB
EGAF00007243782 bai 29.6 kB
EGAF00007243783 crai 95 Bytes
EGAF00007243784 crai 92 Bytes
EGAF00007462299 bam 16.0 GB
EGAF00007462300 cram 7.4 GB
EGAF00007462301 bam 69.7 GB
EGAF00007462302 cram 34.9 GB
EGAF00007462303 bam 8.2 GB
EGAF00007462304 cram 4.2 GB
EGAF00007462305 bam 68.9 GB
EGAF00007462306 cram 37.0 GB
EGAF00007553556 bcf 14.6 GB
EGAF00007553557 vcf.gz 16.8 GB
EGAF00007553558 bcf 2.1 GB
EGAF00007553559 vcf.gz 2.4 GB
EGAF00007553560 tbi 2.7 MB
EGAF00007553561 tbi 434.2 kB
EGAF00008047669 bcf 14.6 GB
EGAF00008047670 csi 2.3 MB
EGAF00008047671 bcf 2.1 GB
EGAF00008047672 csi 353.2 kB
78 Files (1.3 TB)