Need Help?

Using the EGA Download Client

About the client

The EgaDemoClient is a JAVA based data streamer that enables EGA account holders to securely download files and datasets, either through an interactive shell (IS) or using direct command line mode (DCLM).

Both methods provide similar functionality and have the same workflow of use (see graphic below), but only DCLM can use the FUSE layer.

Individual files or datasets may be downloaded by making a request to download and then downloading the request by specifying the request label (the name you give the request).

Multiple files and datasets can be grouped together by making multiple requests to download using the same request label.

The download request, which is made up or 1 or more files, are stored on our servers as tickets, with each file in the download request assigned a unique ticket number.

A ticket is only removed from our server when the file is successfully downloaded, which means that a download can be restarted if, for whatever reason, the download has terminated.

All files within the download request are encrypted before streaming, to ensure secure transfer.

Files are downloaded as encrypted .cip suffixed files, which must be decrypted using the download client with the key specified in the original download request.

Client download and setup

Download the Client

Client installation & requirements

Optimising the client for your network

Using the Interactive Shell

Starting the client

Logging in

Displaying your datasets

Determining the size of a file or dataset

Making a request to download all files in a dataset

Making a request to download individual files in a dataset

Displaying current Requests

Downloading a Request

Decrypt downloaded files

Using direct command mode

Using direct command mode

Decrypt downloaded files

Using the Fuse Layer

Download the Client

Version 2.2.2

Zip file contains 4 files:

EgaDemoClient.jar

ReadmeQuick start guides (2)

2 Quick start guides

Client Installation & Requirements

This application requires Java 1.7+ and Java must be allowed to access the Internet

Ports 80 (http) and 443 (https) should be open for TCP.

For UDT usage UDP port 80 must be open.

Client load balancer is at ega.ebi.ac.uk which resolves to IP address 193.62.192.14

To check that your network is correctly configured for client usage run the command (for this example, assume: user name = demo@test.org, password = 123pass):

java -jar EgaDemoClient.jar -debug demo@test.org 123pass

This command will start by creating a simple socket connection to "http://www.google.com" as well as "https://www.google.com" to ensure that Java has access to the Internet on your system (some firewalls prevent this). It then resolves the EGA hostname "ega.ebi.ac.uk" to an IP address and tries to ping our servers, to verify that you have access to our API from your system. If that is successful then a login is attempted, to verify that your username and password are correct and active. Finally, a set of short data transfers are performed, to verify that you can download data to your system, using TCP and UDT data transfer protocols.

Optimising the client for your network

To maximise your bandwidth usage use the interactive shell (IS), first run the client, login and run 'testbandwidth':

  Your computer > java -jar EgaDemoClient.jar
  Welcome to the EGA Secure Data Shell Demo.
  Type 'help' for help, and 'exit' to quit.
  Ega Demo Download Client  Version: 2.2.4
  EGA >login <username>
  Password:
  Login Success!
  EGA > testbandwidth 7

A series of medium-sized downloads to determine the combined bandwidth maximum to be expected using the specified number of parallel download streams is performed.

This test works both using TCP and UDT settings (command "udt on"/"udt off").

More parallel streams don't always equal higher total throughput! Increasing parallel streams works best if your expected data transfer rate for one individual stream is low. UDT is also not always faster than TCP. Good connections actually tend to perform better using TCP, regardless of distance.

Using the Interactive Shell

Starting the client

The interactive shell is started by running the following command, which opens up the shell:

  Your computer > java -jar EgaDemoClient.jar
  Welcome to the EGA Secure Data Shell Demo.
  Type 'help' for help, and 'exit' to quit.
  Ega Demo Download Client  Version: 2.2.4
  EGA >

Logging in

The first step will always be to log in (assume for this example: user name = demo@test.org, password = 123pass):

  EGA > login demo@test.org
  Password: 123pass
  Login Success!
  EGA >
  Upon receiving the "Login Success!" message you can now view all the commands available to you with the "instructions" command.

Displaying your datasets

You can list all datasets (e.g. EGA > datasets) to which you have access:

EGA > datasets

As well as all files in that dataset:

EGA > files dataset EGAD00010000498

Determining the size of dataset

It is often important to know the size of a dataset prior to download, which can be calculated using the following command:

  EGA > size dataset EGAD00001000814
  Size of dataset EGAD00001000814: 5.2 TB

Making a request to download all files in a dataset

Once you identified the dataset you wish to download, it is time to request it.

Requests require 4 parts:

(1) Type of request: "dataset" or "file"

(2) Dataset accession (EGAD)

(3) Encryption key used for data encryption

(4) Download request label (Pick a label by which you can identify your request)

For example:

  EGA > request dataset EGAD00010000498 abc request_EGAD00010000498
  Requesting.... (This may take longer if there are pending files in the request)
  Resulting Request: request_EGAD00010000498 (19 file requests).
  In this request, all files in dataset EGAD00010000498 are requested.

All files will be encrypted with the key "abc". And the request label is "request_EGAD00010000498". The request resulted in 19 individual files to be requested, with each file assigned a unique ticket number for download.

The request resulted in 19 individual files to be requested, with each file assigned a unique ticket number for download.

If the requested dataset contains pending files, then a request may look like this:

  EGA > request dataset EGAD00010000650 abc request_EGAD00010000650
  Requesting.... (This may take longer if there are pending files in the request)
  This request contains 1216 Pending files! Resulting Request: request_EGAD00010000650 (18 file requests).

In this request, the dataset contains 1234 files, but only 18 are in the EGA archive.

Pending files have not yet been archived. The file status will automatically update to 'available' when they have been archived.

Making a request to download individual files in a dataset

First, identify the files in your dataset.

  EGA > files dataset EGAD00010000498
   Files in EGAD00010000498:
  /PROSTATE_SNP6/PD7445a.CEL.gpg 29898719 EGAF00000278296
  /PROSTATE_SNP6/PD7445b.CEL.gpg 30275814 EGAF00000584909
  /PROSTATE_SNP6/PD7445c.CEL.gpg 29571494 EGAF00000584901
  /PROSTATE_SNP6/PD7445d.CEL.gpg 31040185 EGAF00000584899
  /PROSTATE_SNP6/PD7445e.CEL.gpg 30153169 EGAF00000584902
  /PROSTATE_SNP6/PD7445f.CEL.gpg 29735350 EGAF00000584903
  /PROSTATE_SNP6/PD7446a.CEL.gpg 29336337 EGAF00000584905
  /PROSTATE_SNP6/PD7446d.CEL.gpg 31141416 EGAF00000584910
  /PROSTATE_SNP6/PD7446e.CEL.gpg 29599271 EGAF00000584897
  /PROSTATE_SNP6/PD7447e.CEL.gpg 30863898 EGAF00000584906

Then make a request to download the file using the file accession (EGAF).

  EGA > request file EGAF00000278296 abc file_request
  Requesting.... (This may take longer if there are pending files in the request)
  Resulting Request: file_request (1 file requests).

In this request, the file EGAF00000278296 is requested.

The file will be encrypted using the encryption key "abc" and the request is given the label "file_request".

Displaying current Requests

If you want to know the status of your requests, there are several options: "requests", "allrequests", and "overview":

Using command "requests" lists all current requests. It lists the request labels, along with the number of files for download:

  EGA > requests
  Current Requests:
  555360 4
  EGAD00001000705_request 40
  EGAD00001001859 136
  myrequest 59

Using command "overview" combines 'allrequests' with some general comments, and it also updates the local database to check is any of the pending file have become available since the request:

  EGA >  overview
  Current Requests:
    e776fcb1-5b9c-4e7b-b86f-ed44589a4b82
    ff9f5bdf-8e41-4277-ac3c-0c4deaef88ca
    b0d7206d-bcc2-4bac-9d96-88b0b274557a
    58074d97-2785-4683-bf03-9a5315162ec7

Using command "requesttickets", all tickets for a specified download request label can be displayed:

  EGA > requesttickets 555360
  Your login IP is: 55.66.777.88
  Current Requests from all Sources, with IP address at time of request:
     11.22.333.444_tst 1
     11.22.333.444_longtest 5889
     55.66.777.88_request_EGAD00010000498 19
     55.66.777.88_request_EGAD00010000650 18
  From your current IP you may download these requests:
     request_EGAD00010000498
     request_EGAD00010000650
  You must 'localize' these requests before you can download them (see 'help'):
      tst
      longtest

The command "localize" can be used to change the IP address of the request to the current login IP, to enable download of that request on the local system.

Further details can be displayed for each ticket:

  EGA > details e776fcb1-5b9c-4e7b-b86f-ed44589a4b82
  Requests Details:
    Ticket: e776fcb1-5b9c-4e7b-b86f-ed44589a4b82
    File: /EGAZ00001017962/DIPG62T.sorted.dup.bam.gpg
    File Size: 124030697603
    Request: 555360

Downloading a Request

Requests are downloaded by default to the current path. That can be changed by using the command "path" to set a new path. Command "pwd" displays the current path. The request itself is then downloaded using the "download" command, for example:

EGA > download request_EGAD00010000650

The default is to download three parallel streams. The number of streams can be adjusted (15 max) by specifying a number, for example:

EGA > download request_EGAD00010000650 7

This will download the request in 7 parallel streams.

Decrypt downloaded files

In order to perform the decryption of the file, the full filename and full path should be specified.

Once data has been successfully downloaded it can be decrypted using the client:

EGA > decrypt <filename> <key>

This will decrypt the file specified using <key> as the decryption key. Upon decryption the encrypted file is deleted.

In the case of the ‘decryptkeep’ command the encrypted file is not deleted:

EGA > decryptkeep <filename> <key>

Please, use the Interactive Shell if you want to keep the encrypted files after the decryption process

2.4 Using direct command mode

All of the Interactive Shell (IS) functions can be accessed using the command line. The command line is run by specifying the parameter '-p' at startup, followed by user name and password. (the order of the actual commands following the "-p username password" is not important) To list the help section for the command line:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -help

(assume for this example: user name = demo@test.org, password = 123pass)

The command line also allows to specify a file that contains the username and password (1st line username, 2nd line password). To start the client with such a file (e.g. "login.txt"), use parameter '-pf':

java -jar EgaDemoClient.jar -pf /home/demo/ega/login.txt -help

Example - Listing files in a dataset

java -jar EgaDemoClient.jar -p demo@test.org 123pass -lfd EGAD00010000498

Example - Requesting a dataset:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -rfd EGAD00010000498 -re abc -label request_EGAD00010000498

Example - Requesting a file:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -rf EGAF00000584907 -re abc -label request_ EGAF00000584907

Example - Listing Requests:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -lr

Example - Downloading Request, using the optional parameter '-nt' to specify using 7 parallel streams:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -dr request_EGAD00010000498 -nt 7

Decrypt downloaded files

java -jar EgaDemoClient.jar -p -dc -dck <decryption_key> java -jar EgaDemoClient.jar -p name@ebi.ac.uk password -dc /Users/my_downloads/_ega-box-03_Ca9-22.cel.cip -dck abc

Multiple files can be listed after the -dc switch.

Please, use the Interactive Shell if you want to keep the encrypted files after the decryption process

Using the Fuse Layer

Please note that the current FUSE layer is a development version only

This function is only available using the command line. The FUSE layer allows a directory of encrypted *.cip files to be mounted in an empty directory, where they can be accessed as unencrypted files. This allows for encrypted files to be used directly, without having to be decrypted first. This function is accessible with the ‘-fuse’ option. At the moment this required permission to ‘sudo’ (or to be root) to work. The target directory then is accessible to every user.

sudo java -jar EgaDemoClient.jar -fuse

This command scans the source directory. Cip files are wrapped in an access layer to perform on-demand random-access decryption, and the ‘.cip’ extension is removed from the virtual file. All other files are mounted directly. All .cip files are assumed to be encrypted with the same password/key. Example (making the content of /tmp/download/ available in /tmp/mnt/):

sudo java -jar EgaDemoClient.jar -fuse /tmp/download/ /tmp/mnt/ dipassword776

It is important to supply the terminating “/” when specifying directories. The target directory must be an empty directory. At the moment subdirectories are ignored. And the source directory is scanned only once, upon start-up.