The European Genome-phenome Archive (EGA) offers services for archiving, processing and distribution for all types of potentially identifiable genetic and phenotypic human data at the European Bioinformatics Institute (EBI).
1. Data sharing policies
Journals and funders increasingly require researchers to have a data sharing plan:
Wellcome Trust's "Policy on data management and sharing"
Nature "Availability of data and materials"
Public Library of Science (PloS) "Sharing of Materials, Methods, and Data"
The EBI has run public databases that disseminate data to the wider scientific community for many years.
The EGA is designed to provide an appropriate archive for data on subjects who have consented to the use of their individual genetic data for biomedical research, but not for unlimited public data release.
Data can be submitted to the EGA prior to publication, at other significant milestones, and at study close in accordance with the Toronto statement.
The must suitable EBI archive for your data is dependent on the type of data you are wishing to submit and if the data requires public or controlled access. Public access is defined as complete and open access to all files submitted. Controlled access, in the context of the EGA, requires formal applications to be made to access the submitted data files.
Controlled access data is defined by the original informed consent agreements signed by the participants involved in your study, these consents prevent the derived data files from being dispersed by open and public access. Controlled access data often consists of human data derived from medical research and consortium projects. All data submitted to the EGA MUST be subject to controlled access as defined by the original informed consents. If in doubt consult the informed consent agreements that apply to your study.
Controlled access does not correspond to holding a release prior to publication. All EBI archive resources enable you to hold a submission before publication.
As part of the submission process, submitted data files are packaged into datasets. Access to dataset/s are controlled by a Data Access Committee (DAC), which must be registered as part of the submission process. A DAC may consist of a single or several committee member/s that are responsible for making data access decisions in response to applications made by individuals wishing to access data. A DAC may be responsible for approving access to single or multiple datasets.
An overview of the EGA data distribution model
A named individual, referenced on the DAC Access policy document, within the Data Access Committee (DAC) is provided access to the EGA DAC admin tools, which enable EGA accounts to be created and managed with access permissions for the dataset/s that fall under the responsibility of the DAC.
Data types accepted by the EGA can be split into three categories: Sequence, Array-based and Phenotypes.
All manufacturer-specific raw data formats for the major next generation sequencing platforms are accepted, including aligned BAM files and variation files in VCF format.
All array-based technologies are accepted, which may include the raw data, intensity and analysis files, and there are no restrictions on data formats accepted.
All samples submitted to the EGA must include the attributes of gender, donor ID (anonymised individual identifier) and phenotype information critical for facilitating analysis (for example, defining tumour and non-tumour samples and/or defining disease state) using controlled ontology terms.
The EGA recommends using the Experimental Factor Ontology Database for describing your sample phenotypes.
Complete a submission request form to provide details of the data type (sequence/array-based/phenotype) and estimated size of your submission.
Please inform us if your submission is associated with an existing project consortium, such as the International Cancer Genome Project (ICGC).
Submission, archiving and data processing leading to distribution can take several weeks.
Please contact us in advance, to ensure that your data is ready to release as required.
Please note: The EGA operates a queuing system for submission processing. As a result, one submission CANNOT be prioritised over another.
Receive submission pack, which will include:
i) Submission account log-in for uploading your files and registering your metadata
ii) Template for metadata (for array-based submissions ONLY)
iii) Web links to submission documentation relevant to your submission
Register metadata* using Webin, which may include details of your study, samples, experiments, runs/analysis, policy and dataset/s
*metadata required is dependent on the data type submitted
--Metadata provided will be made publicly available to view on the EGA website and other EBI resource/partner websites--
All registered studies are automatically placed on hold until the named submission or DAC contact instructs our Helpdesk for the study to be released.
When your study is released the named DAC contacts will be provided access to the EGA DAC Admin tools to create and manage EGA accounts with access permissions to the dataset/s affiliated to the study.
Data is archived within our databases and prepared for encrypted distribution to DAC approved users.
We strongly advise you NOT to delete your data until we confirm that your data has been successfully archived.