Using the EGA Download Client

The EgaDemoClient is a JAVA based data streamer that enables EGA account holders to securely download files and datasets, either through an interactive shell (IS) or using direct command line mode (DCLM).

Both methods provide similar functionality and have the same workflow of use (see graphic below), but only DCLM can use the FUSE layer.

 

Individual files or datasets may be downloaded by making a request to download and then downloading the request by specifying the request label (the name you give the request).

Multiple files and datasets can be grouped together by making multiple requests to download using the same request label.

The download request, which is made up or 1 or more files, are stored on our servers as tickets, with each file in the download request assigned a unique ticket number.

A ticket is only removed from our server when the file is successfully downloaded, which means that a download can be restarted if, for whatever reason, the download has terminated.

All files within the download request are encrypted before streaming, to ensure secure transfer.

Files are downloaded as encrypted .cip suffixed files, which must be decrypted using the download client with the key specified in the original download request. 


Download the Client

Version 2.2.2

Zip file contains 4 files:

EgaDemoClient.jar

ReadmeQuick start guides (2)

2 Quick start guides

 


Client Installation &  Requirements

This application requires Java 1.7+ and Java must be allowed to access the Internet

Ports 80 (http) and 443 (https) should be open for TCP.

For UDT usage UDP port 80 must be open.

Client load balancer is at ega.ebi.ac.uk which resolves to IP address 193.62.192.14

To check that your network is correctly configured for client usage run the command (for this example, assume: user name = demo@test.org, password = 123pass):

java -jar EgaDemoClient.jar -debug demo@test.org 123pass

This command will start by creating a simple socket connection to "http://www.google.com" as well as "https://www.google.com" to ensure that Java has access to the Internet on your system (some firewalls prevent this). It then resolves the EGA hostname "ega.ebi.ac.uk" to an IP address and tries to ping our servers, to verify that you have access to our API from your system. If that is successful then a login is attempted, to verify that your username and password are correct and active. Finally, a set of short data transfers are performed, to verify that you can download data to your system, using TCP and UDT data transfer protocols. 


Optimising the client for your network

To maximise your bandwidth usage use the interactive shell (IS), first run the client, login  and run 'testbandwidth': 

Your computer > java -jar EgaDemoClient.jar
Welcome to the EGA Secure Data Shell Demo.
Type 'help' for help, and 'exit' to quit.
Ega Demo Download Client  Version: 2.2.2
EGA >Login <username>
Password:
Login Success!
EGA > testbandwidth 7

A series of medium-sized downloads to determine the combined bandwidth maximum to be expected using the specified number of parallel download streams is performed. 

This test works both using TCP and UDT settings (command "udt on"/"udt off").

More parallel streams don't always equal higher total throughput! Increasing parallel streams works best if your expected data transfer rate for one individual stream is low. UDT is also not always faster than TCP. Good connections actually tend to perform better using TCP, regardless of distance.

 

Using the Interactive Shell

Starting the client

The interactive shell is started by running the following command, which opens up the shell:

Your computer > java -jar EgaDemoClient.jar
Welcome to the EGA Secure Data Shell Demo.
Type 'help' for help, and 'exit' to quit.
Ega Demo Download Client  Version: 2.2.2
EGA > 

 

Logging in 

The first step will always be to log in (assume for this example: user name = demo@test.org, password = 123pass):

EGA > login <demo@test.org
Password: 123pass
Login Success!
EGA >

Upon receiving the "Login Success!" message you can now view all the commands available to you with the "instructions" command.


Displaying your datasets

You can list all datasets (e.g. EGA > datasets) to which you have access:

EGA > datasets

As well as all files in that dataset:

EGA > files dataset EGAD00010000498

 

Determining the size of dataset

It is often important to know the size of a dataset prior to download, which can be calculated using the following command:

EGA > size dataset EGAD00001000814
Size of dataset EGAD00001000814: 5.2 TB

 

Making a request to download all files in a dataset

Once you identified the dataset you wish to download, it is time to request it.

Requests require 4 parts:

(1) Type of request: "dataset" or "file"

(2) Dataset accession (EGAD)

(3) Encryption key used for data encryption

(4) Download request label (Pick a label by which you can identify your request)

 

For example:

EGA > request dataset EGAD00010000498 abc request_EGAD00010000498 
Requesting.... (This may take longer if there are pending files in the request) 
Resulting Request: request_EGAD00010000498 (19 file requests).

In this request, all files in dataset EGAD00010000498 are requested.

All files will be encrypted with the key "abc". And the request label is "request_EGAD00010000498". The request resulted in 19 individual files to be requested, with each file assigned a unique ticket number for download.

The request resulted in 19 individual files to be requested, with each file assigned a unique ticket number for download.

 

If the requested dataset contains pending files, then a request may look like this:

EGA > request dataset EGAD00010000650 abc request_EGAD00010000650 
Requesting.... (This may take longer if there are pending files in the request) 
This request contains 1216 Pending files! Resulting Request: request_EGAD00010000650 (18 file requests).

 

In this request, the dataset contains 1234 files, but only 18 are in the EGA archive.  

Pending files have not yet been archived.  The file status will automatically update to 'available' when they have been archived.


Making a request to download individual files in a dataset

First, identify the files in your dataset.

EGA > files dataset EGAD00010000498 
 Files in EGAD00010000498:
/PROSTATE_SNP6/PD7445a.CEL.gpg 29898719 EGAF00000278296 
/PROSTATE_SNP6/PD7445b.CEL.gpg 30275814 EGAF00000584909 
/PROSTATE_SNP6/PD7445c.CEL.gpg 29571494 EGAF00000584901 
/PROSTATE_SNP6/PD7445d.CEL.gpg 31040185 EGAF00000584899
/PROSTATE_SNP6/PD7445e.CEL.gpg 30153169 EGAF00000584902 
/PROSTATE_SNP6/PD7445f.CEL.gpg 29735350 EGAF00000584903 
/PROSTATE_SNP6/PD7446a.CEL.gpg 29336337 EGAF00000584905 
/PROSTATE_SNP6/PD7446d.CEL.gpg 31141416 EGAF00000584910
/PROSTATE_SNP6/PD7446e.CEL.gpg 29599271 EGAF00000584897
/PROSTATE_SNP6/PD7447e.CEL.gpg 30863898 EGAF00000584906

Then make a request to download the file using the file accession (EGAF).

 EGA > request file EGAF00000278296 abc file_request 
Requesting.... (This may take longer if there are pending files in the request) 
Resulting Request: file_request (1 file requests). 

In this request, the file EGAF00000278296 is requested.

The file will be encrypted using the encryption key "abc" and the request is given the label "file_request".


Displaying current Requests

If you want to know the status of your requests, there are two options: "requests" and "requesttickets":

Using command "requests" lists all current requests.  It lists the request labels, along with the number of files for download:

EGA > requests 
Current Requests:
555360 4
EGAD00001000705_request 40
EGAD00001001859 136
myrequest 59

 

Using command "requesttickets", all tickets for a specified download request label can be displayed:

EGA > requesttickets 555360
Current Requests:
  e776fcb1-5b9c-4e7b-b86f-ed44589a4b82
  ff9f5bdf-8e41-4277-ac3c-0c4deaef88ca
  b0d7206d-bcc2-4bac-9d96-88b0b274557a
  58074d97-2785-4683-bf03-9a5315162ec7

 

Further details can be displayed for each ticket:

EGA > details e776fcb1-5b9c-4e7b-b86f-ed44589a4b82
Requests Details:
  Ticket: e776fcb1-5b9c-4e7b-b86f-ed44589a4b82
  File: /EGAZ00001017962/DIPG62T.sorted.dup.bam.gpg
  File Size: 124030697603
  Request: 555360


Downloading a Request

Requests are downloaded by default to the current path. That can be changed by using the command "path" to set a new path. Command "pwd" displays the current path. The request itself is then downloaded using the "download" command, for example:

EGA > download request_EGAD00010000650

 

The default is to download three parallel streams. The number of streams can be adjusted (15 max) by specifying a number, for example:

EGA > download request_EGAD00010000650 7

This will download the request in 7 parallel streams.


Downloading dataset metadata

Launching this command initiates the download of a dataset tarball that contains all XMLs associated with the dataset, including details of the study, samples, experiments, runs and analysis (if available).

Mapping files are also provided, enabling you to link samples to files.

EGA > downloadmetadata EGAD00010000498
URL http://ega.ebi.ac.uk:80/ega/rest/download/v2/metadata/EGAD00010000498 \
true C:\...\EGADownloadClient2.2.2\EGAD00010000498.tar.gz true 

 

Decrypt downloaded files

Once data has been successfully downloaded it can be decrypted using the client:

EGA > decrypt <filename> <key>

This will decrypt the file specified using <key> as the decryption key. Upon decryption the encrypted file is deleted. In

 

In the case of the ‘decryptkeep’ command the encrypted file is not deleted:

EGA > decryptkeep <filename> <key>


2.4 Using direct command mode

All of the Interactive Shell (IS) functions can be accessed using the command line. The command line is run by specifying the parameter '-p' at startup, followed by user name and password. (the order of the actual commands following the "-p username password" is not important) To list the help section for the command line:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -help

(assume for this example: user name = demo@test.org, password = 123pass)

 

The command line also allows to specify a file that contains the username and password (1st line username, 2nd line password). To start the client with such a file (e.g. "login.txt"), use parameter '-pf':

java -jar EgaDemoClient.jar -pf /home/demo/ega/login.txt -help

 

Example - Listing files in a dataset

java -jar EgaDemoClient.jar -p demo@test.org 123pass -lfd EGAD00010000498

 

Example - Requesting a dataset:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -rfd EGAD00010000498 -re abc -label request_EGAD00010000498

 

Example - Requesting a file:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -rf EGAF00000584907 -re abc -label request_ EGAF00000584907

 

Example - Listing Requests:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -lr

 

Example - Downloading Request, using the optional parameter '-nt' to specify using 7 parallel streams:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -dr request_EGAD00010000498 -nt 7 


Decrypt downloaded files

java -jar EgaDemoClient.jar -p -dc -dck <decryption_key> e.g. java -jar EgaDemoClient.jar -p name@ebi.ac.uk password -dc /Users/my_downloads/_ega-box-03_Ca9-22.cel.cip -dck test

Multiple files can be listed after the -dc switch.


Using the Fuse Layer

This function is only available using the command line. The FUSE layer allows a directory of encrypted *.cip files to be mounted in an empty directory, where they can be accessed as unencrypted files. This allows for encrypted files to be used directly, without having to be decrypted first. This function is accessible with the ‘-fuse’ option. At the moment this required permission to ‘sudo’ (or to be root) to work. The target directory then is accessible to every user.

sudo java -jar EgaDemoClient.jar -fuse 

This command scans the source directory. Cip files are wrapped in an access layer to perform on-demand random-access decryption, and the ‘.cip’ extension is removed from the virtual file. All other files are mounted directly. All .cip files are assumed to be encrypted with the same password/key. Example (making the content of /tmp/download/ available in /tmp/mnt/):

sudo java -jar EgaDemoClient.jar -fuse /tmp/download/ /tmp/mnt/ dipassword776

It is important to supply the terminating “/” when specifying directories. The target directory must be an empty directory. At the moment subdirectories are ignored. And the source directory is scanned only once, upon start-up.