Submitter Portal


Overview

The EGA - Submitter Portal, provides the tools that aim to facilitate the metadata submission of human data to the European Genome Archive.  The aim of this page is to provide a video tutorial on how to use the EGA Submitter Portal. The page is divided into ordered sections for completing a submission.

In this tutorial page we will demonstrate how to use the Submitter Portal to register your metadata. While the video focuses on the run-based submission (for raw files - fastq - and aligned data - BAM/CRAM), the analysis-based submission is defined below (for your BAM/BAI pairs, variation -VCF - and phenotype files)

Before registering the metadata is very important that all submitters have encrypted and and uploaded to their ega submission account (ega-box).

The EGA is a shared, public service with limited resources. In order to manage the available resources, we enforce a soft limit of 10Tb per submission account at any one time. Please do not exceed this limit. If you are approaching this limit please contact EGA helpdesk so that we can advise on how to register the associated metadata and trigger the archiving of files, so that you can continue with your submission. If we note that your submission account increases above 10Tb on a consistent base your password will be changed until metadata is associated

Please note that some metadata (run and analysis objects) cannot be registered until at least 24hours after the files have been uploaded to your box. Additionally, submissions to EGA can take approximately one month, so please, allow plenty of time for the submission and archiving processes.


Common Aspects

The metadata objects required for read submissions are as follows:

  • Study: information about the sequencing study
  • Samples: Information about the sequencing samples
  • Experiments: Information about the sequencing methods, protocols and machines. Experiments generate the linkage between samples and study. Only necessary for FASTQ and BAM/CRAM submissions.
  • Runs: Samples, experiments and files are linked through runs - appropriate objects for FASTQ and BAM/CRAM submissions
  • Analysis: References the analysis (BAM) files; associated with samples and study.
  • DAC: contains information about the Data Access Committee (DAC)
  • Policy: contains the Data Access Agreement (DAA); associated with DAC
  • Dataset: contains the collection of runs/analysis data files to be subject to controlled access; associated with Policy
  • **Study, samples, DAC and policy metadata can all be registered prior to uploading files**

If you are performing Array-based submission(s), the Submitter Portal should only be used to register the Study, Samples, Data Access Committee (DAC) and Policy metadata objects. We are currently working on the features to provide the creation of array metadata submissions using the portal..


Identifiers

EGA objects can be identified by their unique accession. These are ID's displayed everywhere, shared among all EGA locations and specific for each data type (More information on the list below)

  EGA
  Accession ID
  EGA Object description
EGAS EGA Study Accession ID
EGAC EGA DAC Accession ID
EGAP EGA Policy Accession ID
EGAN EGA Sample Accession ID
EGAR EGA Run Accession ID
EGAX EGA Experiment ID
EGAZ EGA Analysis Accession ID
EGAD EGA Dataset Accession ID
EGAB EGA Submission ID
EGAF EGA File Unique Accession ID


Tutorial Video

In the below 12 short videos, you can find a worked example, with detailed instructions on how to use the EGA submitter portal to perform metadata submissions to the EGA.

Points to Notice

There is a strong relationship among EGA metadata objects. Unless the primary objects (study, samples and DAC) are properly submitted, their linked and secondary objects will not validate (experiments, runs, analyses or policies). The tertiary metadata object (dataset) require all the objects to be submitted before can be validated and submitted. Should you prefer to submit everyone at once, please generate all the objects with no validation and the go to "Edit title and description" tab and click "I'm done". This will validate and submit all together


Analysis Submission

The EGA submitter portal video focuses on a unique use, the submission of Runs.

Aligned BAM files are expected to be submitted as runs (1 to 1 cardinality with samples). Analysis should be only be used for BAM/BAI pair, VCF and phenotype linkage to samples.. The analysis is an EGA specific metadata object that links Samples, to Files. This object also stores some metadata about your experiments, such as the experiment type, genome reference, or the platform used.

**If only BAM or CRAM alignment files are submitted but not the original unaligned FASTQ files, then please make sure that the BAM or CRAM files also contain the unaligned reads. This is critical to enable primary re-analysis and re-alignment of the dataset using new tools or future genome assemblies.**
Aligned/ Mapped Sequence Reads


Prior to defining the Analysis

In order to register your analysis you should firstly :

  1. Register your Study
  2. Register the DAC and Policy
  3. Register the Samples
  4. Encrypt and Upload the files

Please note that the EGA allows for the re-use of registered metadata. Therefore the previously registered Study, DAC, Policy or samples can be re-used for the analysis data submission.


Defining the Analysis

  1. In the Submitter Portal accordion, select the option "Link files and samples" and click "Analysis Data".
  2. Start by selecting the sample(s) to be linked to the file, and populate the required attribute fields. Please note the existence of mandatory fields. These must be populated.
  3. Finally, select the file and file type to be associated with the sample. If you wish to add additional files, click the button "Add additional files".
  4. Your analysis will be created in draft status. To learn more about validating, editing or deleting the analysis view the Submitter Portal video section above

Points to notice

When populating the chromosome field (mandatory). Please, after selecting the chromosome(s), press key ENTER in order to save your selection.


Submitter Portal - Guided Documentation


Login

The EGA submitter portal credentials are provided by the Helpdesk team when a submission account is requested

Login


Main page

Main page: when you log in to the Submitter Portal, you will find the following image (with your submissions):

Main Page

In the main page you can see the open submissions in your ega-box. The submission can have different status depending on the objects in it:

  • Draft: the objects in the submission have been created but not validated or registered (submitted)
  • Validated: the objects in the submission have been created and validated. The submission will be in validated status once all objects are also validated (V)
  • Validated with errors: the objects in the submission have been created but in the process of validation could not been completed due to an error.
  • Submitted: the all the objects in the submission have been created and registered (submitted).
    • IMPORTANT: once an object is submitted, it gets a unique accession number assigned. Then, this registered objects is automatically added on our databases.
  • Submitted Partially: one or more objects in the submission is submitted, but there are still other objects not submitted (draft or validated)
  • Submitted draft: when a submitted submission is modified, the status turns into submitted draft.
    • IMPORTANT: submit the modified object again in order to re-obtain a submitted status
  • Submitted validated with errors: when a submitted submission is modified and modified, the status turns into submitted validated with errors if the modification contains an error.

Top Right Buttons


Submissions

1) Submissions : Clicking this button you can see all submissions in you ega-box

Submissions

By clicking on the option in the circle you can filter your submissions depending on their status:

  • Open submissions: Draft, Validated, Validated with errors, Submitted draft or Submitted validated with errors
  • Close submissions: Submitted
  • All submissions: all status together


Submitted Objects

2) Submitted objects : You can also see your objects (studies, samples, files, experiments, analyses, dacs, policies and datasets):

Submitted Objects

For example, samples. You can also filter your samples depending on their status:

Filter Samples

Moreover, you can also filter your samples by different options: Status, EGA ID, Alias, Subject ID, Updated, Created

Submitted objects


New Submission

3) New submission: Click this button when you need to start a new submission

New submission

IMPORTANT:

  • Add a title for your submission. This way you will easily distinguish between different submission (in case you are undergoing multiple submission in the same ega-box).
  • You can start a submission in different steps. For example, you can create a submission in the samples if you already have a study registered.

In the submission there are several tabs (one for each object)

  • Register study: study
  • Register samples: sample
  • Define one or more experiments:experiment
  • Link files and samples: run and/or analysis
  • Register data access committee: dac
  • Register data access policy: policy
  • Submit dataset: dataset

When registering an object, there are some field that are mandatory (marked with a *). If these mandatory fields are not populated, you will not be able to save the object:

New submission

As you can see, the ‘Save study’ is greyed because there is still an mandatory field empty (Study type). Once this field is filled, the objects can be saved by clicking on ‘Save study’.

Save Study
Created Object

There are several action for a created object:

ACTIONS:

  • 1 ) Validate: by clicking on the green tick, you will request to validate the object
  • 2 ) Submit: by clicking the blue arrow, you will request to submit the object
  • 3 ) Edit: by clicking on the yellow pencil, you will request to edit the object
  • 4 ) Delete: by clicking the red cross, you will request to delete the object

Each object is linked in a unidirectional way with another object. Map of linkage of objects in a submission:

EGA Metadata

For example, an study is not directly linked to a dataset. A dataset is linked to runs (linkage between samples and files). These linkages are linked to experiments and, these experiments are the ones directly linked to a study.


Reusing Registered Objects

In the EGA we strongly encourage reusing registered objects if needed. How can you do that? In each tab you can find a click box where all objects in the ega-box will display. For instance, you want to reuse an old study but you need to register a new experiment, in the experiment tab you will find the following checkbox:

Reusing Registered Objects

By clicking the Show all box’s studies:

Reuse Studies

The same goes when reusing a sample in the linkage with the files:

Reuse Samples

Or to reuse a DAC for a policy:

Reuse Policies

Or to reuse a policy for a dataset:

Reuse Dataset

And the same with multiple combination of objects.

For this reason, if you already have old submitted objects (via Webin or SP) you can reuse them, without having to register them all over again.


How to submit all objects

How to submit all objects (the whole submission) at once? By clicking on the ‘I’m done. Please, process this submission’ button on the first tab of the submission tab list:

Submit All Objects


Troubleshooting

When you try to validate a run and the samples used are not registered (submitted):

Sample:

Sample

Experiment:

Experiment

Run

Run

Click on validate and a message box will appear saying that the submission request was sent:

Validate

After a few minutes the following message will appear:

Error Message

These error messages stating that the validation failed because the referenced alias could not be found are because the sample actually DOES NOT EXIST on our database (where the call is sent to validate or submit your object).If you submit your sample first (by clicking on the blue arrow on the sample object):

Troubleshooting

Then, click to submit the run it will work this time (as the sample is not registered and added on our database):

Troubleshooting

Also, the experiment is submitted itself with the validation and submission of its linked objects (sample and runs)

Troubleshooting

IMPORTANT: If there are several runs in different status, the experiment will duplicated itself in different statuses. It is ok. This object (experiment) will submit itself once the submission is completed and all samples and runs are submitted.

Finally, in order to observe the error messages, please, go to the ‘Submission errors console’ tab

Submission Errors Console


Webin Tool

EGA Webin is an online tool that could be used to submit metadata (affiliated to sequence files) to the EGA. Furthermore, it can also be used to to register Study (EGAS), Data Access Committee (DAC) and Policy (EGAP) for all array based submissions.

The Webin platform is a historical tool that preceded the current Submitter Portal. It was developed and used to register metadata affiliated to sequence files. Detailed documentation about Webin and how to use it can be obtained here.

Click on the links below for guides on submitting specific metadata using the Webin : 

Read (unaligned/raw)
Analysis: Aligned (BAM) 
Analysis: Variant (VCF)
Complete Genomics
Array Based
Phenotype

Webin will not be further maintained, however, it can be used as a backup tool if there is any issue with your submission via the Submitter Portal. Please contact the ega helpdesk for any related queries.