EGA Statistics

Welcome to the EGA statistics page.

The aim of this page is to present a number of regularly update statistcs about the European Genome - Phenome archive organization.

The statistics are comprised of 5 main topics :

  • Bibliography: Here we expose in a yearly and cumulative manner, the publications, citations, publishers journals and impact factor of EGA related studies.
  • Growth: In this section the growth of registered objects such as studies, datasets and dacs in the EGA.
  • Community: Here, we show the amount and global distribution of submitters and requesters
  • Archive: In this section, we expose the overall volume of data available to download and the different archived file types.
  • Distribution: Here, we show the amounf of data distributed by the EGA.
Last Updated : October 20 2020

Publications citing data stored at the EGA

The studies that use EGA datasets are requested to cite the EGA into their bibliography. This fact allows for a faster spreading of the role and purpose of the EGA, being the availability of their studies for testability and reusability purposes.

In order to track the studies we use the unique study accession, provided by the EGA when submitting a new study, and comprised of the keyword EGAS followed by 11 digits such as EGAS00000000001. We use the Europe PMC RESTful Web Service in order to query, and index studies where the EGAS accession was used.

Below, you can find a chart showing the journal publishers of the studies citing data stored in the EGA being cited:

Publications Citing Data Stored in the EGA
Last Updated : October 20 2020

Number of publications citing data stored in the EGA by year

Here, you can find a chart showing the number of publications citing data stored in the EGA by year. These publications could have deposit, or re-used the data. The study acession can be found in the publication. The values are non-cumulative.

Number of Publications Citing Data Stored in the EGA by Year
Last Updated : October 20 2020

Number of publications citing data stored in the EGA by year

Here, you can find a chart showing the number of publications citing data stored in the EGA by year. These publications could have deposit, or re-used the data. The study acession can be found in the publication. The values are cumulative.

Number of Cumulative Publications Citing Data Stored in the EGA by Year
Last Updated : October 20 2020

Journal Impact Factor of Publications Citing Data Stored in the EGA

Here, you can find a chart showing the journal impact factor of publications citing data stored in the EGA. We have used Scimago Journal Rank for this purpose.

  • Low impact treshold was defined between 0.0 and 5.0 excluding;
  • Medium impact treshold was defined between 5.0 and 10.0 excluding;
  • High impact range was established from values equal to or higher than 10.0;

Journal Impact Factor of Publications Citing Data Stored in the EGA
Last Updated : October 20 2020

Publications citing a publication containing a EGA study accession

The publications, stating a EGA study accession in the bibliography, can be cited for testibility or reusability purposes. Below, you can find a chart showing the publishers of the publications being cited:

Publications Citing a Publication Containing a EGA Study Accession
Last Updated : October 20 2020

Number of publications citing a publication containing a EGA study accession by year

Here, you can find a chart showing the number of publications citing a publication containing a EGA study accession by year by year. The values are non-cumulative:

Number of published citations by year
Last Updated : October 20 2020

Released Study, Dataset and Dac by year.

The below figure represent the distribution of released Studies, Datasets and Dacs per year.

Released EGA Studies, Datasets and Dacs.
Last Updated : October 20 2020

Cumulative released Study, Dataset by year.

The below figure represent the cumulative sum of released Studies, Datasets and Dacs by year.

Cumulative released EGA Studies, Datasets and Dacs
Last Updated : October 20 2020

Created Study, Dataset and Dac by year.

This figure is available for authenticated users.


Cumulative created Study, Dataset and Dac by year.

This figure is available for authenticated users.



Requester accounts created by country

The below figure represents the number of requester accounts by country. Country was infered from the requester email domain. Unassigned country represents requesters whose email domain could not be assigned to a country such as @gmail.com

Requester Accounts Created by Country
Last Updated : October 20 2020

Requester accounts created by year

The below figure represents the number of requester accounts by year.

Requester Accounts Created by Year
Last Updated : October 20 2020

Industry requester accounts.

The below figure represents the top 15 industry requesting companies.

Industry Requester Accounts.
Last Updated : October 20 2020

Submitter accounts created by country

The below figure represents the number of submitter accounts by country. Location was infered from the requester email domain.

Submitter Accounts Created by Country
Last Updated : October 20 2020

Submitter accounts created by year

The below figure represents the number of submitter accounts by year.

Submitter Accounts Created by Year
Last Updated : October 20 2020

EGA Archive growth in size and number of files

The below figure represents the ega archive growth in size (GB) and number of files.

EGA Archive Growth
Last Updated : October 20 2020

Archive files

The below figure represents on a initial level, the percentage of archived files by data technology. The second level, acessible by clicking in the required extension, displays the number of files by extension archived in the ega archive. Please click over the extension to learn more about the files.

EGA Archived Files
Last Updated : October 20 2020

Cumulative Data Distributed

The below figure represents the cumulative EGA data distribution via the download clients(v2 and v3.x api versions) using HTTPS, FTP, and download boxes using ASPERA.

EGA Cumulative Data Distribution
Last Updated : October 20 2020

Data Distributed

The below figure represents the EGA data distribution via the download clients(v2 and v3.x api versions) using HTTPS, FTP, and download boxes using ASPERA.

EGA Data Distribution
Last Updated : October 20 2020

Cumulative EGA distribution of the UK Biobank dataset

Genomic data from the 500,000 people participating in the UK Biobank initiative is being distributed via the European Genome–phenome Archive (EGA). UK Biobank provides extremely detailed, high-quality datasets on individuals. It is an unprecedented collection that offers endless possibility and substantial efficiency savings for biomedical research and understanding the causes of disease.

Around 500,000 people from across the UK, between the ages of 40 and 69, participated in UK Biobank between 2006 and 2010, undergoing extensive measurements and genotyping. They provided blood, urine and saliva samples for future analysis – including genetic – and gave detailed information about themselves. They also agreed to allow UK Biobank to integrate information from their electronic health records. In order to learn more please visit the UK Biobank study page.

EGA - UK Biobank Cumulative Distribution
Last Updated : September 29 2020

Daily EGA distribution of the UK Biobank dataset

The below figure represents the UK Biobank daily data distribution from the EGA.

EGA - UK Biobank Daily Distribution
Last Updated : September 29 2020