Using the EGA Download Client
The EgaDemoClient is a JAVA based data streamer that enables EGA account holders to securely download files and datasets, either through an interactive shell (IS) or using direct command line mode (DCLM).
Both methods provide similar functionality and have the same workflow of use (see graphic below), but only DCLM can use the FUSE layer.
Individual files or datasets may be downloaded by making a request to download and then downloading the request by specifying the request label (the name you give the request).
Multiple files and datasets can be grouped together by making multiple requests to download using the same request label.
The download request, which is made up or 1 or more files, are stored on our servers as tickets, with each file in the download request assigned a unique ticket number.
A ticket is only removed from our server when the file is successfully downloaded, which means that a download can be restarted if, for whatever reason, the download has terminated.
All files within the download request are encrypted before streaming, to ensure secure transfer.
Files are downloaded as encrypted .cip suffixed files, which must be decrypted using the download client with the key specified in the original download request.
Zip file contains 4 files:
ReadmeQuick start guides (2)
2 Quick start guides
This application requires Java 1.7+ and Java must be allowed to access the Internet
Ports 80 (http) and 443 (https) should be open for TCP.
For UDT usage UDP port 80 must be open.
Client load balancer is at ega.ebi.ac.uk which resolves to IP address 22.214.171.124
To check that your network is correctly configured for client usage run the command (for this example, assume: user name = firstname.lastname@example.org, password = 123pass):
java -jar EgaDemoClient.jar -debug email@example.com 123pass
This command will start by creating a simple socket connection to "http://www.google.com" as well as "https://www.google.com" to ensure that Java has access to the Internet on your system (some firewalls prevent this). It then resolves the EGA hostname "ega.ebi.ac.uk" to an IP address and tries to ping our servers, to verify that you have access to our API from your system. If that is successful then a login is attempted, to verify that your username and password are correct and active. Finally, a set of short data transfers are performed, to verify that you can download data to your system, using TCP and UDT data transfer protocols.
To maximise your bandwidth usage use the interactive shell (IS), first run the client, login and run 'testbandwidth':
Your computer > java -jar EgaDemoClient.jar Welcome to the EGA Secure Data Shell Demo. Type 'help' for help, and 'exit' to quit. Ega Demo Download Client Version: 2.2.4 EGA >Login <username> Password: Login Success! EGA > testbandwidth 7
A series of medium-sized downloads to determine the combined bandwidth maximum to be expected using the specified number of parallel download streams is performed.
This test works both using TCP and UDT settings (command "udt on"/"udt off").
More parallel streams don't always equal higher total throughput! Increasing parallel streams works best if your expected data transfer rate for one individual stream is low. UDT is also not always faster than TCP. Good connections actually tend to perform better using TCP, regardless of distance.
The interactive shell is started by running the following command, which opens up the shell:
Your computer > java -jar EgaDemoClient.jar Welcome to the EGA Secure Data Shell Demo. Type 'help' for help, and 'exit' to quit. Ega Demo Download Client Version: 2.2.4 EGA >
The first step will always be to log in (assume for this example: user name = firstname.lastname@example.org, password = 123pass):
EGA > login <email@example.com Password: 123pass Login Success! EGA >
Upon receiving the "Login Success!" message you can now view all the commands available to you with the "instructions" command.
You can list all datasets (e.g. EGA > datasets) to which you have access:
EGA > datasets
As well as all files in that dataset:
EGA > files dataset EGAD00010000498
It is often important to know the size of a dataset prior to download, which can be calculated using the following command:
EGA > size dataset EGAD00001000814 Size of dataset EGAD00001000814: 5.2 TB
Once you identified the dataset you wish to download, it is time to request it.
Requests require 4 parts:
(1) Type of request: "dataset" or "file"
(2) Dataset accession (EGAD)
(3) Encryption key used for data encryption
(4) Download request label (Pick a label by which you can identify your request)
EGA > request dataset EGAD00010000498 abc request_EGAD00010000498 Requesting.... (This may take longer if there are pending files in the request) Resulting Request: request_EGAD00010000498 (19 file requests).
In this request, all files in dataset EGAD00010000498 are requested.
All files will be encrypted with the key "abc". And the request label is "request_EGAD00010000498". The request resulted in 19 individual files to be requested, with each file assigned a unique ticket number for download.
The request resulted in 19 individual files to be requested, with each file assigned a unique ticket number for download.
If the requested dataset contains pending files, then a request may look like this:
EGA > request dataset EGAD00010000650 abc request_EGAD00010000650 Requesting.... (This may take longer if there are pending files in the request) This request contains 1216 Pending files! Resulting Request: request_EGAD00010000650 (18 file requests).
In this request, the dataset contains 1234 files, but only 18 are in the EGA archive.
Pending files have not yet been archived. The file status will automatically update to 'available' when they have been archived.
First, identify the files in your dataset.
EGA > files dataset EGAD00010000498 Files in EGAD00010000498: /PROSTATE_SNP6/PD7445a.CEL.gpg 29898719 EGAF00000278296 /PROSTATE_SNP6/PD7445b.CEL.gpg 30275814 EGAF00000584909 /PROSTATE_SNP6/PD7445c.CEL.gpg 29571494 EGAF00000584901 /PROSTATE_SNP6/PD7445d.CEL.gpg 31040185 EGAF00000584899 /PROSTATE_SNP6/PD7445e.CEL.gpg 30153169 EGAF00000584902 /PROSTATE_SNP6/PD7445f.CEL.gpg 29735350 EGAF00000584903 /PROSTATE_SNP6/PD7446a.CEL.gpg 29336337 EGAF00000584905 /PROSTATE_SNP6/PD7446d.CEL.gpg 31141416 EGAF00000584910 /PROSTATE_SNP6/PD7446e.CEL.gpg 29599271 EGAF00000584897 /PROSTATE_SNP6/PD7447e.CEL.gpg 30863898 EGAF00000584906
Then make a request to download the file using the file accession (EGAF).
EGA > request file EGAF00000278296 abc file_request Requesting.... (This may take longer if there are pending files in the request) Resulting Request: file_request (1 file requests).
In this request, the file EGAF00000278296 is requested.
The file will be encrypted using the encryption key "abc" and the request is given the label "file_request".
If you want to know the status of your requests, there are two options: "requests" and "requesttickets":
Using command "requests" lists all current requests. It lists the request labels, along with the number of files for download:
EGA > requests Current Requests: 555360 4 EGAD00001000705_request 40 EGAD00001001859 136 myrequest 59
Using command "requesttickets", all tickets for a specified download request label can be displayed:
EGA > requesttickets 555360 Current Requests: e776fcb1-5b9c-4e7b-b86f-ed44589a4b82 ff9f5bdf-8e41-4277-ac3c-0c4deaef88ca b0d7206d-bcc2-4bac-9d96-88b0b274557a 58074d97-2785-4683-bf03-9a5315162ec7
Further details can be displayed for each ticket:
EGA > details e776fcb1-5b9c-4e7b-b86f-ed44589a4b82 Requests Details: Ticket: e776fcb1-5b9c-4e7b-b86f-ed44589a4b82 File: /EGAZ00001017962/DIPG62T.sorted.dup.bam.gpg File Size: 124030697603 Request: 555360
Requests are downloaded by default to the current path. That can be changed by using the command "path" to set a new path. Command "pwd" displays the current path. The request itself is then downloaded using the "download" command, for example:
EGA > download request_EGAD00010000650
The default is to download three parallel streams. The number of streams can be adjusted (15 max) by specifying a number, for example:
EGA > download request_EGAD00010000650 7
This will download the request in 7 parallel streams.
Launching this command initiates the download of a dataset tarball that contains all XMLs associated with the dataset, including details of the study, samples, experiments, runs and analysis (if available).
Mapping files are also provided, enabling you to link samples to files.
EGA > downloadmetadata EGAD00010000498 URL http://ega.ebi.ac.uk:80/ega/rest/download/v2/metadata/EGAD00010000498 \ true C:\...\EGADownloadClient2.2.4\EGAD00010000498.tar.gz true
Once data has been successfully downloaded it can be decrypted using the client:
EGA > decrypt <filename> <key>
This will decrypt the file specified using <key> as the decryption key. Upon decryption the encrypted file is deleted. In
In the case of the ‘decryptkeep’ command the encrypted file is not deleted:
EGA > decryptkeep <filename> <key>
All of the Interactive Shell (IS) functions can be accessed using the command line. The command line is run by specifying the parameter '-p' at startup, followed by user name and password. (the order of the actual commands following the "-p username password" is not important) To list the help section for the command line:
java -jar EgaDemoClient.jar -p firstname.lastname@example.org 123pass -help
(assume for this example: user name = email@example.com, password = 123pass)
The command line also allows to specify a file that contains the username and password (1st line username, 2nd line password). To start the client with such a file (e.g. "login.txt"), use parameter '-pf':
java -jar EgaDemoClient.jar -pf /home/demo/ega/login.txt -help
Example - Listing files in a dataset
java -jar EgaDemoClient.jar -p firstname.lastname@example.org 123pass -lfd EGAD00010000498
Example - Requesting a dataset:
java -jar EgaDemoClient.jar -p email@example.com 123pass -rfd EGAD00010000498 -re abc -label request_EGAD00010000498
Example - Requesting a file:
java -jar EgaDemoClient.jar -p firstname.lastname@example.org 123pass -rf EGAF00000584907 -re abc -label request_ EGAF00000584907
Example - Listing Requests:
java -jar EgaDemoClient.jar -p email@example.com 123pass -lr
Example - Downloading Request, using the optional parameter '-nt' to specify using 7 parallel streams:
java -jar EgaDemoClient.jar -p firstname.lastname@example.org 123pass -dr request_EGAD00010000498 -nt 7
java -jar EgaDemoClient.jar -p -dc -dck <decryption_key> e.g. java -jar EgaDemoClient.jar -p email@example.com password -dc /Users/my_downloads/_ega-box-03_Ca9-22.cel.cip -dck test
Multiple files can be listed after the -dc switch.
This function is only available using the command line. The FUSE layer allows a directory of encrypted *.cip files to be mounted in an empty directory, where they can be accessed as unencrypted files. This allows for encrypted files to be used directly, without having to be decrypted first. This function is accessible with the ‘-fuse’ option. At the moment this required permission to ‘sudo’ (or to be root) to work. The target directory then is accessible to every user.
sudo java -jar EgaDemoClient.jar -fuse
This command scans the source directory. Cip files are wrapped in an access layer to perform on-demand random-access decryption, and the ‘.cip’ extension is removed from the virtual file. All other files are mounted directly. All .cip files are assumed to be encrypted with the same password/key. Example (making the content of /tmp/download/ available in /tmp/mnt/):
sudo java -jar EgaDemoClient.jar -fuse /tmp/download/ /tmp/mnt/ dipassword776
It is important to supply the terminating “/” when specifying directories. The target directory must be an empty directory. At the moment subdirectories are ignored. And the source directory is scanned only once, upon start-up.