Submitting array based metadata

The submission metadata required for Array-based submissions must be submitted using EGA Webin and by completing the Array-based Format sheet (AF), the guidelines for this workflow are described on this page.

**Metadata submitted as xmls or through the Webin tool will be made publicly available to view on the EGA website and other EBI resource/partner websites**

Array Based Metadata

  

1) Use Webin to register your Study, Data Access Committee (DAC) and Policy

This online interface enables you to create new and edit existing submissions.

Go to the EGA Webin page and log in using your submission account name and password.

 

i) Register your study 

    • Go to the New Submission tab
    • Choose Register study (project), click Next and complete the web form
    • Click submit to accession your study
    • Take a note of your Study accession number (EGASXXXXXXXXXXX)
To use the study accession number in a publication, we suggest the following format:

"Sequence data has been deposited at the European Genome-phenome Archive (EGA), which is hosted by the EBI and the CRG, under accession number EGASXXXXXXXXXXX.
Further information about EGA can be found on https://ega-archive.org "The European Genome-phenome Archive of human data consented for biomedical research"( http://www.nature.com/ng/journal/v47/n7/full/ng.3312.html ). 

 

 ii) Register your Data Access Committee (DAC)

   Further information on the role of your DAC can be found here.

  • Go to the New Submission tab
  • Choose Register Data Access Committee (DAC) and click Next and follow the online prompts 
  • Take a note of the DAC accession number (EGACXXXXXXXXXXX)

 

iii) Register your Data access policy

  Your Data access policy provides the terms and conditions of data use, this is also referred to as the Data Access Agreement (DAA).

  Completion of a DAA by the applicant/s should form part of the application process to the Data Access Committee (DAC).

  • Go to the New Submission tab
  • Choose Register Data access policy and click Next and follow the online prompts 
  • Take a note of the Policy accession number obtained (EGAPXXXXXXXXXXX)

 

2) Complete the EGA-Array-based-Format (AF) sheet 

Once you have completed the registration of your Study, DAC and policy using Webin you must then complete and return the AF spreadsheet.

 

The EGA-AF spreadsheet consists of four components:

i)    Webin Accessions

      Provide the Webin accessions for your study, DAC and policy

ii)   Sample and phenotypes   

      Sample and phenotype information.

iii)   Dataset

      Describes the datasets to be created.

iv)   Data Files

      Define how your data is going to be organised into datasets and packets for distribution.

 

Should further assistance be required after going through the guide below; please do not hesitate to contact the EGA helpdesk.

 

EGA-Array-based-Format sheet: Webin Accessions

 

Array based metadata model

EGA-Array-based-Format sheet: Samples and phenotypes

Samples submitted to the EGA should be accompanied with information regarding gender, donor ID (anonymised individual identifier) and phenotype information critical for facilitating analysis (for example, defining your tumour and non-tumour samples and/or defining disease state using controlled ontology terms)

Gender should be provided as 'male', 'female' or 'unknown'.  Where gender is unknown due to sex chromosome aneuploidies the sex karyotype should be provided, for example, 'XXY'.

Phenotypes should be provided for all characteristics that are critical for facilitating analysis (for example, defining your tumour and non-tumour samples and/or defining disease state using controlled ontology terms)

The EGA recommends using the Experimental Factor Ontology Database for describing your sample phenotypes.

 

You will find the Samples and phenotypes component located in the tab at the bottom of the sheet.

 

Webin Accessions

 

 

EGA-Array-based-Format sheet: Datasets

We suggest that each dataset should consist of a common set of data.  The example below consists of three datasets, grouped according to shared data type, technology and by case/control.

We also like to capture the number of samples that make up a dataset and the Data Access Committee responsible for approving access to the named dataset.

 

You will find the Dataset component located in the tab at the bottom of the sheet.

Samples and phenotypes

 

EGA-AF: Data files

What follows is an example of how to map your samples to the array based files added to your upload account.

You will find the Data files component  located in the tab at the bottom of the sheet.

Data Files

 

 

Important note: If you have uploaded files NOT using the EGA uploader, you MUST upload the encrypted and unencrypted md5sum values of all files uploaded to your submission account using the filename nomenclature (file.gpg.md5 and file.md5).  Your submission will not be processed without md5sum values supplied for all files in the the CORRECT format.

 

What happens after the submission of a dataset?

All datasets affiliated to unreleased studies are automatically placed on hold until the authorised submitter or DAC contact instructs our ega-helpdesk@ebi.ac.uk for the study to be released.

Datasets affiliated to released studies will automatically be released.

When your study progresses is released the named DAC contacts will be provided access to the EGA DAC admin tools  to create and manage EGA accounts with access permissions to the dataset/s affiliated to the study.

Further information regarding the role of the Data Access Committee can be found here

Finally, your data is archived within our databases and prepared for encrypted distribution upon the request of permitted EGA account holders.

We strongly advise you NOT to delete your data until we confirm that your data has been successfully archived.