Today (19th July) sees the public release of the eagerly awaited UK BioBank datasets.

The UK BioBank datasets encompass samples from over 500,000 individuals between the ages of 40 and 60 who participated in the UK Biobank between 2006 and 2010 during which time they underwent extensive measurements and genotyping. In addition, participants consented to allowing UK Biobank to integrate information from their electronic health records. The UK BioBank data therefore, is one of the most enriched datasets currently available and an unprecedented collection that offers endless possibility and substantial efficiency savings for biomedical research and understanding the causes of disease.

The UK BioBank data is currently distributed with two layers of encryption – one layer was applied to the data by UK BioBank at submission and the second as part of standard EGA procedure. Both layers of encryption will therefore need to be removed before data use.

There are two UK BioBank datasets held at the EGA – these are the Genotype dataset (split by Chromosome) (157 files 3.75TB)

And the imputed Genotype dataset (excluding sex Chromosomes) (67 Files 1.96TB)

During the first few weeks of activity the EGA has distributed 785TB of data to ~300 users from 3 locations across Europe (Hinxton, Hemel Hempstead and Barcelona) – with in excess of 12,000 successful file downloads (268TB) resulting from the EGA download client itself.

We wish all of our UK BioBank users the very best of luck in their research.

Related press releases:
(1) UK Biobank partners with the EGA – EMBL-EBI
(2) UK Biobank partners with the EGA – CRG