EGA Statistics

Welcome to the EGA statistics page.

The aim of this page is to present a number of regularly update statistcs about the European Genome - Phenome archive organization.

The statistics are comprised of 5 main topics :

  • Bibliography: Here we expose the publication list where EGA studies are cited.
  • Growth: In this section the growth of registered objects such as studies, datasets and dacs in the EGA.
  • Community: Here, we show the amount and global distribution of submitters and requesters
  • Archive: In this section, we expose the overall volume of data available to download and the different archived file types.
  • Distribution: Here, we show the amounf of data distributed by the EGA.
Latest update : 2.10.2018

Publications citing data stored at the EGA

The studies that use EGA datasets are requested to cite the EGA into their bibliography. This fact allows for a faster spreading of the role and purpose of the EGA, being the availability of their studies for testability and reusability purposes.

Below, you can find a chart showing the publishers of the studies being cited:

Publications citing EGA Studies

Number of published studies by year

Here, you can find a chart showing the number of publications of studies being cited by year. The values are non-cumulative.

Number of Published Studies by Year

Number of cumulative published studies by year

Here, you can find a chart showing the number of publications of studies being cited by year. The values are cumulative.

Number of Cumulative Published Studies by Year

Journal Impact Factor of published studies

Here, you can find a chart showing the Impact Factor of the journals where the EGA studies have been published.

  • Low impact treshold was defined between 0.0 and 5.0 excluding;
  • Medium impact treshold was defined between 5.0 and 10.0 excluding;
  • High impact range was established from values equal to or higher than 10.0;

Journal Impact Factor of published studies

Publications citing data stored at the EGA

The publications, stating a EGA study accession in the bibliography, can be cited for testibility or reusability purposes. Below, you can find a chart showing the publishers of the publications being cited:

Publications citing data stored at the EGA

Number of published citations by year

Here, you can find a chart showing the number of studies citing EGA published studies by year. The values are non-cumulative:

Number of published citations by year


Released Study, Dataset by year.

The below figure represent the distribution of released Studies, Datasets per year.

Released EGA Studies and Datasets

Cumulative released Study, Dataset by year.

The below figure represent the cumulative sum of released Studies and Datasets by year.

Cumulative released EGA Studies and Datasets

Created Study, Dataset and Dac by year.

The below figure represent the distribution of created Studies, Datasets and Dacs per year.

Created EGA Studies, Datasets and Dac

Cumulative created Study, Dataset and Dac by year.

The below figure represent the cumulative sum of created Studies, Datasets and Dacs by year.

Cumlative created EGA Studies, Datasets and Dac


Requester accounts created by country

The below figure represents the number of requester accounts by country. Country was infered from the requester email domain.

Requester Accounts Created by Country

Requester accounts created by year

The below figure represents the number of requester accounts by year.

Requester Accounts Created by Year

Industry requester accounts.

The below figure represents the top 15 industry requesting companies.

Industry Requester Accounts.

Submitter accounts created by country

The below figure represents the number of submitter accounts by country. Location was infered from the requester email domain.

Submitter Accounts Created by Country

Submitter accounts created by year

The below figure represents the number of submitter accounts by year.

Submitter Accounts Created by Year


EGA Archive growth in size and number of files

The below figure represents the ega archive growth in size (GB) and number of files.

EGA Archive Growth

Archive files

The below figure represents on a initial level, the percentage of archived files by data technology. The second level, acessible by clicking in the required extension, displays the number of files by extension archived in the ega archive. Please click over the extension to learn more about the files.

EGA Archived Files


Cumulative Data Distributed

The below figure represents the cumulative EGA data distribution via the download client, and aspera download boxes .

EGA Cumulative Data Distribution

Data Distributed

The below figure represents the EGA data distribution via the download client, and aspera download boxes .

EGA Data Distribution

Cumulative EGA distribution of the UK Biobank dataset

Genomic data from the 500,000 people participating in the UK Biobank initiative is being distributed via the European Genome–phenome Archive (EGA). UK Biobank provides extremely detailed, high-quality datasets on individuals. It is an unprecedented collection that offers endless possibility and substantial efficiency savings for biomedical research and understanding the causes of disease.

Around 500,000 people from across the UK, between the ages of 40 and 69, participated in UK Biobank between 2006 and 2010, undergoing extensive measurements and genotyping. They provided blood, urine and saliva samples for future analysis – including genetic – and gave detailed information about themselves. They also agreed to allow UK Biobank to integrate information from their electronic health records. In order to learn more please visit the UK Biobank study page.

EGA - UK Biobank Cumulative Distribution

Daily EGA distribution of the UK Biobank dataset

The below figure represents the UK Biobank daily data distribution from the EGA.

EGA - UK Biobank Daily Distribution