Downloading files and datasets

    2 Using the EGA download client

         2.1.1 Requirements

         2.1.2 Troubleshooting

    2.2 Download the Client

    2.3 Using the Interactive Shell

        2.3.1 Making a request to download all files in a dataset

        2.3.2 Making a request to download individual files in a dataset

        2.3.3 Displaying current Requests

        2.3.4 Downloading a Request

        2.3.5 Downloading dataset metadata

        2.3.6 Decrypt downloaded files

    2.4 Using direct command mode

        2.4.1 Decrypt downloaded files

    2.5 Using the Fuse Layer


2 Using the EGA download client

Click here for the list of datasets NOT currently available in the download client: contact ega-helpdesk@ebi.ac.uk to request an Aspera download account for these datasets.

As the name implies, the EgaDemoClient provides a “Demo” or "reference implementation" of what can be done in terms of the interaction with the REST API. Refer to the API or API Wrapper Java Class documentation to explore the full functionality of downloading from the EGA.

The EgaDemoClient application can be used as an interactive shell or direct command line client. Both methods provide similar functionality, but you may only use the FUSE layer option from the command line.

The client application, on request, can create a small local SQLite cache database where some information is cached. The contents of this local database can be viewed using any SQLite viewer. The purpose of this caching database is to keep track of requested data, specifically for download requests. Because some data sets contain links to files which are still pending (which are not yet archived in the EGA archive, and therefore cannot be properly requested or downloaded yet) a request may not contain all files that are part of the data set. By using the demo client to make requests, it will become possible to track when these pending files will become available, and easily request any additional files.


2.1.1 Requirements

This application requires Java 1.7+ and Java must be allowed to access the Internet in your firewall, on standard ports 80 (http) and 443 (https). For UDT there must also be UDP port 80 open.

Client load balancer is on on ega.ebi.ac.uk which resolves to IP address 193.62.192.14


2.1.2 Troubleshooting

There are several options using the interactive shell to explore how to better use the download system.

EGA > tutorial 
EGA > instructions 

This command performs a series of short downloads using the TCP and the UDT protocol to provide an indication which protocol is expected to work faster from your location, and what the maximum bandwidth can be expected to be:

EGA > testbestdownload

This command performs a series of medium-sized downloads to determine the combined bandwidth maximum to be expected using different numbers of parallel download streams. The default is to use 3 parallel streams, that number can be specified as well. This test works both using TCP and UDT settings (command "udt on"/"udt off").

 EGA > testbandwidth 
 EGA > testbandwidth 7 

More parallel streams don't always equal higher total throughput! Increasing parallel streams works best if your expected data transfer rate for one individual stream is low. UDT is also not always faster than TCP. Good connections actually tend to perform better using TCP, regardless of distance.

The command line offers a debug command that will check the most common issues arising with using the client. It uses the '-debug' option with user name and password (assume: user name = demo@test.org, password = 123pass):

java -jar EgaDemoClient.jar -debug demo@test.org 123pass

This command will start by creating a simple socket connection to "http://www.google.com" as well as "https://www.google.com" to ensure that Java has access to the Internet on your system (some firewalls prevent this). It then resolves the EGA host name "ega.ebi.ac.uk" to an IP address and tries to ping our servers, to verify that you have access to our API from your system. If that is successful then a login is attempted, to verify that your user name and password are correct and active. Finally, a set of short data transfers are performed, to verify that you can download data to your system, using TCP and UDT data transfer protocols.


2.2 Download the Client

Version 2.2.2


2.3 Using the Interactive Shell

See troubleshooting if you have problems using the client.

The interactive shell is started by running the command "java -jar EgaDemoClient.jar". This opens the EGA shell:

Welcome to the EGA Secure Data Shell Demo.
Type 'help' for help, and 'exit' to quit.
Ega Demo Download Client  Version: 1.0.4
With Ega Download Agent  Version: 0.5 BETA

EGA >

Typing "help" will display all commands available.

The first step will always be to log in (assume: user name = demo@test.org, password = 123pass):

EGA > login demo@test.org 123pass
Using Cache DB at: /home/demo/demo_demo@test.org_ega_db.sqlite
Creating timer task to update local Cache DB asynchronously every 30 min.
Updating local Cache DB..... in Progress
Login Success!

The time required to update the local database varies from time to time; it will print "Updating local Cache DB..... Done!" when the update is complete.

Upon receiving the "Login Success!" message you can now use all the commands listed with the "help" command.


2.3.1 Making a request to download all files in a dataset

You can list all datasets (e.g. EGA > datasets) to which you have access, as well as all files in that dataset (e.g. EGA > files dataset EGAD00010000498). Once you identified the dataset you wish to download, it is time to request it. Requests require 4 parts:

(1) the type of the request: "dataset"
(2) The ID: the dataset ID
(3) the encryption key for received data. And
(4) a label - this label is later used to download the requested data. You should pick a label by which you can identify your request.

For example:

EGA > request dataset EGAD00010000498 abc request_EGAD00010000498
Requesting.... (This may take longer if there are pending files in the request)
Resulting Request:
  request_EGAD00010000498 (19 file requests).

In this request all files in dataset EGAD00010000498 are requested. The data is going to be encrypted with the key "abc". And the request label is "request_EGAD00010000498". The request resulted in 19 individual files to be requested: there are now 19 resource URLs available to download this dataset

If the requested dataset contains pending files (see above) then a request may look like this:

EGA > request dataset EGAD00010000650 mypass request_EGAD00010000650
Requesting.... (This may take longer if there are pending files in the request)
This request contains 1216 Pending files!
Resulting Request:
  request_EGAD00010000650 (18 file requests). 

In this request the dataset contains 1234 files, but only 18 are in the EGA archive.


2.3.2 Making a request to download individual files in a dataset

First, identify the files in your dataset.

EGA > files dataset EGAD00010000498

Files in EGAD00010000498:

  /PROSTATE_SNP6/PD7445a.CEL.gpg 29898719 EGAF00000278296

  /PROSTATE_SNP6/PD7445b.CEL.gpg 30275814 EGAF00000584909

  /PROSTATE_SNP6/PD7445c.CEL.gpg 29571494 EGAF00000584901

  /PROSTATE_SNP6/PD7445d.CEL.gpg 31040185 EGAF00000584899

  /PROSTATE_SNP6/PD7445e.CEL.gpg 30153169 EGAF00000584902

  /PROSTATE_SNP6/PD7445f.CEL.gpg 29735350 EGAF00000584903

  /PROSTATE_SNP6/PD7446a.CEL.gpg 29336337 EGAF00000584905

  /PROSTATE_SNP6/PD7446b.CEL.gpg 28165811 EGAF00000584904

  /PROSTATE_SNP6/PD7446c.CEL.gpg 30383508 EGAF00000584900

  /PROSTATE_SNP6/PD7446d.CEL.gpg 31141416 EGAF00000584910

  /PROSTATE_SNP6/PD7446e.CEL.gpg 29599271 EGAF00000584897

  /PROSTATE_SNP6/PD7446f.CEL.gpg 30756385 EGAF00000584907

  /PROSTATE_SNP6/PD7446g.CEL.gpg 30608708 EGAF00000584908

  /PROSTATE_SNP6/PD7447a.CEL.gpg 29608192 EGAF00000584896

  /PROSTATE_SNP6/PD7447b.CEL.gpg 28497750 EGAF00000584912

  /PROSTATE_SNP6/PD7447c.CEL.gpg 28999141 EGAF00000584911

  /PROSTATE_SNP6/PD7447d.CEL.gpg 28749723 EGAF00000584898

  /PROSTATE_SNP6/PD7447e.CEL.gpg 30863898 EGAF00000584906
 

Then make a request to download the file using the file accession (EGAF).

  EGA > request file EGAF00000278296 abc file_request

Requesting.... (This may take longer if there are pending files in the request)

Resulting Request:

  file_request (1 file requests).

In this request the file EGAF00000278296 is requested. The file will be encrypted using the encryption key "abc" and the request is given the label "file_request "


2.3.3 Displaying current Requests

If you want to know the status of your requests, there are several options: "requests", "allrequests", and "overview":

Using command "requests" lists all current requests that match the IP address you logged in from. It lists the request labels, along with the number of available files for download:

EGA > requests
Current Requests:
  request_EGAD00010000498 19
  request_EGAD00010000650 18

Requests with Pending Files:
  request_EGAD00010000650 1216

Using command "overview" combines 'allrequests' with some general comments, and it also updates the local database to check is any of the pending file have become available since the request:

EGA > overview
Your login IP is: 55.66.777.88

Current Requests from all Sources, with IP address at time of request:

  11.22.333.444_tst 1
  11.22.333.444_longtest 5889
  22.33.44.55_test 19
  33.44.55.66_testlong1 5617
  55.66.777.88_request_EGAD00010000498 19
  55.66.777.88_request_EGAD00010000650 18 

From your current IP you may download these requests:

 request_EGAD00010000498
 request_EGAD00010000650

You must 'localize' these requests before you can download them (see 'help'):

 tst
 longtest
 test
 testlong1

Updating cache, please wait...

The command "localize" can be used to change the IP address of the request to the current login IP, to enable download of that request on the local system.


2.3.4 Downloading a Request

Requests are downloaded by default to the current path. That can be changed by using the command "path" to set a new path. Command "pwd" displays the current path. The request itself is then downloaded using the "download" command, for example:

EGA > download request_EGAD00010000650 

The default is to download three parallel streams. The number of streams can be adjusted (15 max) by specifying a number, for example:

EGA > download request_EGAD00010000650 7

This will download the request in 7 parallel streams.


2.3.5 Downloading dataset metadata

Launching this command initiates the download of a dataset tar ball that contains all xmls associated with the dataset, including details of the study, samples,experiments, runs and analysis.Mapping files are also provided, enabling you to link sample to files.

EGA > downloadmetadata EGAD00010000498

URL http://ega.ebi.ac.uk:80/ega/rest/download/v2/metadata/EGAD00010000498 true C:\...\EGADownloadClient2.2.2\EGAD00010000498.tar.gz true C:\...\EGADownloadClient2.2.2\EGAD00010000498.tar.gz

URL http://ega.ebi.ac.uk:80/ega/rest/download/v2/metadata/EGAD00010000498
true    C:\...\EGADownloadClient2.2.2\EGAD00010000498.tar.gz
true
C:\...\EGADownloadClient2.2.2\EGAD00010000498.tar.gz


2.3.6 Decrypt downloaded files

Once data has been successfully downloaded it can be decrypted using the demo client:

EGA > decrypt  

This will decrypt the file specified using {key} as the decryption key. Upon decryption the encrypted file is deleted. In case of the ‘decryptkeep’ command the encrypted file is not deleted:

EGA > decryptkeep  


2.4 Using direct command mode

Click here for the list of datasets NOT currently available in the download client: contact ega-helpdesk@ebi.ac.uk to request an Aspera download account for these datasets.

All of the interactive shell functions can be accessed using the command line. The command line is run by specifying the parameter '-p' at startup, followed by user name and password. (the order of the actual commands following the "-p username password" is not important) To list the help section for the command line:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -help

(assume: user name = demo@test.org, password = 123pass)

The command line also allows to specify a file that contains the username and password (1st line username, 2nd line password). To start the client with such a file (e.g. "login.txt"), use parameter '-pf':

java -jar EgaDemoClient.jar -pf  /home/demo/ega/login.txt -help 

Example - Listing files in a dataset

java -jar EgaDemoClient.jar -p demo@test.org 123pass -lfd EGAD00010000498

Example - Requesting a dataset:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -rfd EGAD00010000498 -re abc -label request_EGAD00010000498

Example - Requesting a file:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -rf EGAF00000584907 -re abc -label request_ EGAF00000584907

Example - Listing Requests:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -lr

Example - Downloading Request, using the optional parameter '-nt' to specify using 7 parallel streams:

java -jar EgaDemoClient.jar -p demo@test.org 123pass -dr request_EGAD00010000498 -nt 7 


2.4.1 Decrypt downloaded files

java -jar EgaDemoClient.jar -p   -dc  -dck 
e.g. java -jar EgaDemoClient.jar -p name@ebi.ac.uk password -dc /Users/my_downloads/_ega-box-03_Ca9-22.cel.cip -dck test

Multiple files can be listed after the -dc switch.


2.5 Using the Fuse Layer

This function is only available using the command line. The FUSE layer allows a directory of encrypted *.cip files to be mounted in an empty directory, where they can be accessed as unencrypted files. This allows for encrypted files to be used directly, without having to be decrypted first. This function is accessible with the ‘-fuse’ option. At the moment this required permission to ‘sudo’ (or to be root) to work. The target directory then is accessible to every user.

sudo java -jar EgaDemoClient.jar -fuse   

This command scans the source directory. Cip files are wrapped in an access layer to perform on-demand random-access decryption, and the ‘.cip’ extension is removed from the virtual file. All other files are mounted directly. All .cip files are assumed to be encrypted with the same password/key. Example (making the content of /tmp/download/ available in /tmp/mnt/):

sudo java -jar EgaDemoClient.jar -fuse /tmp/download/ /tmp/mnt/ dipassword776

It is important to supply the terminating “/” when specifying directories. The target directory must be an empty directory. At the moment subdirectories are ignored. And the source directory is scanned only once, upon start-up.