File preparation Due to the processes used at the EGA for file archival the use of non-alphanumeric characters in a filename will cause issues in archival. By convention whitespaces in filenames are to be avoided and should be replaced with the underscore character (_). Before encrypting your files please make sure that any files that will be uploaded to EGA do not use special characters such as # ? ( ) [ ] / \ = + < > : ; " ' , * ^ | & Crypt4gh EGACryptor Files encrypted with EGACryptor must be uploaded via FTP EGACryptor The EGACryptor v.2.0.0 is a JAVA-based application which enables submitters to produce EGA compliant encrypted files along with files for the encrypted and unencrypted md5sum for each file to be submitted. The application will generate an output folder that will by default mirror the directory structure containing the original files. This output folder can subsequently be uploaded to the EGA FTP staging area via an FTP or Aspera client. Download EgaCryptor Download EgaCryptor Using the EgaCryptor Using the EgaCryptor Encrypting single file Encrypting multiple files Encrypting all files in folder Points to note Troubleshooting Troubleshooting Download EGACryptor The required jar files can be obtained by downloading EgaCrytptor jar file After the file has been downloaded, extraction of the zipped archive is required. The EGACryptor has been built to work with Java Runtime Environments from version 6 and above and with the OpenJDK Environment. Please refer to the relevant resources for installation guidance. Installing the latest version of the OpenJDK will include the JCE files. If your installation of Java JRE is less than 1.8.0_151 will require the manual installation of the JCE Policy Files. You can verify the version of the Java SE Runtime Environment (JRE) installed by using the command: $ java -version If you need to install the JCE please follow the instructions below: Installing the JCE policy files (due to licensing terms and conditions the required policy files must be downloaded direct from the ORACLE website) : Download the unlimited strength JCE policy files (JRE 6 / JRE 7/ JRE 8) Uncompress and extract the downloaded file. This will create a subdirectory called JCE. This directory contains the following files: README.txt, COPYRIGHT.html, local_policy.jar and US_export_policy.jar Install the two policy JAR files by replacing the existing ones in your java home directory. Install the two policy JAR files by replacing the existing ones in your directory. The standard place for JCE jurisdiction policy JAR files is: /lib/security [Unix] or \lib\security [Win32] Notes: refers to the directory where the Java SE Runtime Environment (JRE) was installed. Additional performance enhancements that have been included in the EGACryptor V2.0.0: The ability to parallelise the processing of datasets through the use of the resources on a system. Multicore systems will allow the user to specify n-1 cores for an n-core system. The use of this feature on clusters may speed up the processing of datasets that have large file numbers but consult your local cluster guide to ensure that there are not monopolising resources that are needed by other system users. The default for this process remains single threaded. 3 levels of system usage can be specified. Full usage within the limits detailed above. A limited mode that will ensure that 50% of the system resources are available for other tasks. Maximum mode is limited to 75% of system resources, this allows encryption to be prioritised but allows for the system to be usable for light alternate tasks. Finally there is a throttling mode that allows you to specify the exact number of computational threads to be used. the EGACryptor is able to ingest a structured directory and will output a directory with the same structure containing the encrypted files along with the md5checksums for the plain and encrypted files. The entire output directory can then be uploaded to the EGA for archival. as with the input path, it is now possible to specify the output path. the options have been updated inline with the upgraded functionality. The tool can only be used via the command line. The EGACryptor is designed to perform a single task, encrypting your data, for upload of these files please refer to our uploading guide Using the EgaCryptor Below are the three ways on how the EGACryptor tool can be used: Encrypting a single file : java -jar ../EGA-Cryptor_2_0_0.jar -i example1.bam Encrypting multiple files : java -jar ../EGA-Cryptor_2_0_0.jar -i "example1.bam,example2.bam" Encrypting all the files within a folder java -jar ../EGA-Cryptor_2_0_0.jar -i path/to/target/directory By default the EGACryptor v2 will create a new output directory containing all encrypted files and the relevant checksums within the target directory. If a specific directory is desired this can be specified by using the -o flag. This can be achieved in a similar manner to the following example: java -jar ../EGA-Cryptor_2_0_0.jar -i /path/to/target/directory -o /path/to/output/directory The tool will output three files per input file: file.gpg ( encrypted file ) file.md5( file md5 sum value file ) file.gpg.md5 ( encrypted file md5 sum value file ) All output files must then be uploaded to your submission account using Aspera or FTP. Further documentation on how to upload files: FTP and Aspera. Points to note Remember to provide the path to EgaCryptor.jar and run the command from within the directory the file/s are located. ECryptor writes files to the source directory of your local file system, as a result you must have write-access permissions for the source directory. Troubleshooting If in doubt about the function of the EGACryptor it is recommended to first consult the built-in documentation. This can be accessessed by using the -h flag as stated in the following table. Built-in Commands Table: list of the command line options built into EGA-Cryptor v2.0.0. Command line Option Action --------------------- ----------- -f Set this option to allow application to create maximum threads to utilise full capacity of cores/processors available on machine -h Use this option to view the bult-in help menu * -i File(s) to encrypt. Provide file/folder path or comma separated file path if multiple files in double quotes -l Set this option to allow application to create maximum threads equals to 50% capacity of cores/processors available on machine -m Set this option to allow application to create maximum threads equals to 75% capacity of cores/processors available on machine -o Path of the output file. This is optional. If not provided then output files will be generated in the same path as that of source file (default: output- files) -t Set this option if user wants to control application to create maximum threads as specified. Application will calculate no. of cores/processors available on machine & will create threads accordingly Encryption Errors UnixFileSystem.createFileExclusively (Native Method) The error is thrown by UNIX ("UnixFileSystem.createFileExclusively(Native Method)"). It appears that the user does not have write-access to the file system where the file to be uploaded is located. EGACryptor always writes MD5 checksums into files before uploading them to the server, and these files are created in the same location where the uploaded file itself resides.Solution: address your directory permission issue and re-run the command. Install JCE Unlimited Strength Jurisdiction Policy files The JCE policy unlimited strength jurisdiction files should be installed according to your current java version If you are still facing difficulty with the EGACryptor v.2 after having consulted the documentation please contact the EGA Helpdesk.
DAC Portal Welcome to the DAC Portal documentation! If you are involved in governance or legal aspects, technical or operational aspects, or serving as a data steward, this page will be helpful for you. By exploring these materials, you can define your own Data Access Committee (DAC) and policies, understand the minimal requirements for a DAC and a policy object, and comprehend how the EGA data access requests are managed. DAC Portal Index Setting up your account DACs and Policies Pending Request table Manage Data Requests History page Audit your DACs, policies, and datasets Deprecation User Preferences DAC API - A programmatic approach Setting up your account Register yourself as an EGA user. The Helpdesk team will validate your account (this could take up to 48 hours). After validation, you will receive an email with a link to verify your email - Make sure you click on the link to verify your account! If you don’t receive the email, please check your spam folder. Once your account is active you can login to the DAC Portal. We recommend you to check out the Take The Tour! DACs and Policies Create a DAC Click on Create a DAC Add the title and description to your DAC. Once ready, click on the ‘Create’ button. You will have to wait until the Helpdesk Team validates the creation of your DAC. Once approved by Helpdesk, your DAC will be assigned with a persistent identifier (EGAC). Edit a DAC Once your DAC is registered and approved, you can edit it and add contacts. Remember, you can add as many contacts as needed. And, you can also remove/add contacts anytime. To add a contact, you must write the username or email of a registered EGA user in the Members field. Once you have typed the whole username or email, a drop-down menu will appear where you can select the contact. Select a contact from the drop-down menu. Make sure you write the full username or email of the contact so it appears as an option. Make sure that you select one main contact. Set up the role you want to grant to the new contact of the DAC. We recommend you to check out the FAQ to learn more about the different roles! After adding all the necessary contacts and assigning roles, click on Update. This will send an email notification to the new DAC member, letting them know about the invitation to managing the DAC. Once they log into the DAC Portal, they will be able to either accept or turn down the invitation. Create a policy First, please note that all information registered in a policy metadata object will be publicly available on the EGA website. For each released dataset, all users will be able to read the policy under the “Data Access” tab. Select Create a Policy from the Policies tab. Select the DAC that you want your policy to depend on. Add the title and terms and conditions for accessing and using your data. If you already have an external website for data access requests, you can add the URL directly here (see example here). You can find a template of a Data Access Agreement (DAA) document in our Policy documentation page. The DAA template is provided for guidance only and should be adapted to suit your own purpose depending on the security policies, terms for publication or embargoes, and restrictions on data use or sharing. You can also add Data Use Ontologies (DUO) to your policy. These DUO codes are used to semantically tag the terms and conditions of using the data linked to the policy (example). For more information, you can refer to the Data Uso Ontology documentation. Once completed, make sure to click the Create button in order to register your policy. A persistent identifier (EGAP) will be assigned automatically. Edit a policy Please, note that you will not be able to edit the policy if it shows in orange. Meaning that you are a member of the DAC linked to that policy. Members don’t have edit rights. Only admins can edit objects. You can check your registered policies on My policies, available at the menu in the top-right corner. Select your policy. It will then display the information of that policy, allowing you to edit the information. Once you are done with the modifications, click Update. Pending Requests Table Table Settings Upon selecting a specific DAC, you will be presented with a customisable table showcasing all pending data access requests. This table allows you to tailor the displayed information according to your preferences by selecting desired columns. Available Columns (Displayed with the Eye Icon): Date: The date when the data access request was submitted. Full Name: Name and surname of the user submitting the request. Email: Email address associated with the user's request. Username: The username linked to the EGA account of the user submitting the request. Organisation: Affiliated organisation of the user submitting the request. Dataset: EGA accession ID for the requested EGA dataset. Dataset Title: Title of the requested dataset. DAC Comment: Space provided for internal comments related to the request. Expiration: Option to specify an expiration date for granted permissions. Check more information on the pending request table here! Table Filters By default, all data access requests are displayed. However, you have the option to apply filters to refine your view. Follow these steps to apply filters: Click on the More button. This action will reveal the different columns available for applying filters. As you select columns, the checked fields will be displayed at the top of the request table. Choose the specific values within the selected columns that you want to filter for. Click the Search button to apply the selected filters. Do you find yourself frequently applying the same filters? You can save multiple combinations of filters for easy access at any time! Here's how: Click the Save button. Provide a descriptive name for the filter combination. Click Save to confirm. To load saved filters, simply select the saved filter from the dropdown menu to apply it automatically. Check more information on filters here! Advanced Search For users requiring more granular control over their data access requests, the DAC Portal offers an Advanced Search feature using JIRA Query Language (JQL). With JQL, you can craft precise and complex queries to filter and retrieve specific data access requests based on various criteria such as date, user details, dataset attributes, and more. Using JQL's flexibility, you can create custom queries to meet your specific needs, allowing for advanced filtering options beyond the standard filters provided. You can find the Advanced Search feature following these steps: Click the … button Select Advanced To view the allowed fields for filtering and explore all available options, simply click on the info icon. Once you have written the filtering values, click the Search button to apply. Check more information on filters here! Manage Data Requests Upon logging into the DAC Portal, you'll notice a sand clock icon next to the DAC ID, indicating the number of pending requests. Click on the DAC to review these requests. Accept requests Click on the right side of the toggle button to grant access to the user. You can manage other requests before proceeding to the next step. After managing the requests, click on the Apply button. A confirmation box will appear summarising the options to be applied. Click on Yes, Confirm to proceed with granting permissions. Deny requests Click on the left side of the toggle button to deny access to the user. Provide a reason for the denial. Note that the user will receive an email with this reason. Click on the Done button. You can manage other requests before proceeding to the next step. After managing the requests, click on the Apply button. A confirmation box will appear summarising the options to be applied. Click on Yes, Confirm to proceed with denying permissions. Tips! Apply a filter to view all requests to be managed at once. Use the toggle in the row with column names to grant or deny permissions for multiple requests simultaneously. You can grant and deny permissions in the same action, simplifying the process. The confirmation box will provide a summary of all actions, including grants, denials, DAC comments, and expiration dates. History Page The History page serves as a dedicated space to view information regarding all requests managed by all DAC members. Here, you can review active permissions and revoke them as needed. Go to the History page by clicking on the "HISTORY" button from a DAC page in the DAC Portal. Here are the different row types you may find: Current permissions: row in green with a toggle button to revoke permissions on the right. Approved requests: row in green, with no toggle button. Request denied: row in red with “request denied” on the right. Permission revoked: row in red with “permission revoked” on the right. Distinctions to Note! Between Request Denied and Permission Revoked: Request Denied: Refers to requests that were rejected from the outset, indicating that access to the dataset was never granted. Permission Revoked: Indicates that permissions were previously granted but have since been revoked. Users with permissions revoked have previously accessed the dataset. Between Approved Request and Current Permissions: Approved Request: Represents an entry when a data access request has been approved in the past. Current Permissions: Denotes ongoing permissions where a user has present access to the dataset. An approved request may now appear as Permission Revoked in the present. By observing these different rows, users can gain insight into the complete history of a user's interactions and permissions regarding a dataset over time. To revoke access to a specific user for a dataset, follow these steps: Go to History page Look for the row with the specific permissions. You can use the filters! Click on the toggle button to revoke access. Add a denial reason. Bear in mind that the requester will receive the denial reason! Click on the Apply button. Check more information on the Hisotry page here! Audit your metadata objects In the DAC Portal, you can efficiently manage and audit various metadata objects pertinent to your role as a Data Controller. Upon accessing the DAC Portal, you will encounter three primary tabs on the homepage: DAC: Contains information about Data Access Committees (DACs). Policies: Provides insights into linked policies. Datasets: Displays datasets and relevant details. Within each tab, you'll find a comprehensive list of the objects you manage, these being grouped by type (DAC, Policies, Datasets), as well as by your role (member/admin). Whilst the lists give you a quick look, to check how things are connected, we've added a table at the bottom DACs and policies. DACs Let’s check which policies are linked to a specific DAC: Go to "My DACs." Select a DAC. Click on "EDIT" to see more details. You are now on this page: Scroll down to see the linked policies. Click on “List of linked policies of this DAC” and you will see a list of all policies linked to your selected DAC. Policies Do you want to view a list of datasets connected to a particular policy? Follow the same steps mentioned earlier, but head to the policy tab this time. In the policy tab, you'll find a list of all your policies. Here, you may notice two different icons next to the policy ID (EGAP): DAC Icon (): This represents the DAC. Hover your mouse over the icon to see the DAC ID. Dataset icon (): This indicates datasets falling under that policy. The number next to the icon tells you how many datasets are linked to the policy. For example, if you see "2" next to the icon, it means there are two datasets linked to that policy. If you want to check the linked datasets and their relevant information, simply click on a specific policy. You'll then find the "List of linked datasets of this policy" at the bottom of the page. Datasets Finally, in the dataset tab, by default you will see a list of all the datasets you can manage with all your DACs and policies. Yet, we've included two handy ways to organise them: DAC vs. policy: You can group by datasets, either by DAC or policy. Released vs. unreleased: You can sort out datasets based on their release status. Feel free to experiment with both options! For instance, if you want to see which DACs have unreleased datasets, simply select DAC and unreleased, and you'll get the details you need! Deprecation Do you have a bunch of metadata objects like DACs and policies that you don’t need anymore? This section shows you how to get rid of them! But what does "deprecation" mean for EGA? It's basically changing the status of a metadata object to "deprecated," which means we won't be using it in the future. In simple terms, it's like saying these objects are no longer useful. However, because we believe in making metadata FAIR, once an object has a persistent identifier, we can't just delete it. So, instead of deleting, we deprecate it. Here's a helpful tip! If you want to make a metadata object disappear from the DAC Portal, deprecate it. You won’t see it in the portal anymore! Let’s say you want to deprecate a DAC. Let’s do it! Go to "My DACs." Select a DAC. Click "EDIT" to see more details. You are now in this page: Click on the Deprecate button It will then appear a message. There are two options here: Your DAC is not linked to any policy, hence it’s ready to be deprecated. Click on “Yes. Confirm” to deprecate your DAC object! Your DAC is linked to at least one policy, and you need to either: Deprecate the linked policy first, or Link the policy to a different DAC Now, let’s say you don’t want to get rid of the policy altogether, but you want to change the DAC it’s linked to. Here's how: Go to the policy tab. Find the policy you want to change (for example, EGAP50000000019). Choose a new DAC to link it to. Click on Update. See the Edit Policy section for more details. After ensuring that the DAC we want to deprecate isn’t linked to any policy, return to the "My DACs" section and follow the steps outlined previously. This will lead you to a confirmation message. You can only deprecate DACs and Policies. To deprecate a dataset, please contact our Helpdesk team. Check more information on how deprecation works here! User Preferences We have implemented email notifications in the DAC Portal. Here’s the complete list: DACs: Pending requests Approved by Helpdesk Rejected by Helpdesk DAC Invitation Requesters: Data access request approved Data access request denied Permissions revoked Upcoming expiration date As a DAC member, you will be able to decide whether you want to receive the DAC notifications or not. For that, go to the top-right corner menu, select User Preference. You will be able to decide whether you want to receive notifications for: Approved by Helpdesk Rejected by Helpdesk DAC Invitation For pending requests, you will be able to select how often do you want to receive the notification: Daily Weekly Fortnightly As a DAC member, you are responsible for managing data access requests. Consequently, you will receive notifications for pending requests. If you prefer not to manage these requests, please arrange to be removed as a contact for your DAC. Here's a tip! Do you have a pending request that you don’t want to receive a notification? Add a DAC comment! The EGA understands that a request resolution can take some time, for this reason, if you add a comment (make sure you save it by clicking the APPLY button!) we will filter those requests at the time of sending the notification! DAC API - A programmatic approach In addition to the new DAC Portal, we are excited to announce the release of the DAC API. This enables users to programmatically manage permissions. If you are interested in learning more about the technical specifications, you can click the button below. Check out the DAC API specification!
With the DAC Portal, it is possible to streamline the process of managing access to sensitive data, ensuring that researchers have the necessary resources to advance scientific discovery while maintaining the highest standards of data protection and privacy. We know that, as a new tool, its first use can be complex. For this reason, in this article we will try to show all the elements of interest. Enjoy! Credentials: who needs to create a new user to access the DAC Portal and who doesn't? Only new users (since September the 12th) need to create an EGA account to access the DAC Portal. The EGA team went one step ahead by creating EGA accounts for former DAC members to login to the DAC Portal. These accounts are linked to the personal email used in the DAC structure: To check the linked email, DAC members can paste the DAC ID right after this link: https://metadata.ega-archive.org/dacs/ (e.g. https://metadata.ega-archive.org/dacs/EGAC00001002543) In case of password oblivion, it is possible to set a new one. All the process is channelled through the personal email. Some DAC members may be interested in updating the DAC email. This action can be done contacting our Helpdesk team. For the correct functioning of the DAC Portal, users must add the missing information of the DAC Profile. For a deeper dive, check out our documentation. The first step: knowing your workspace on the Portal Once logged in to the platform using the EGA User credentials, note the sections available: DACs and Policies. In the first one it is possible to check all user's registered DACs by default. It provides a comprehensive overview of each DAC, including its EGA accessions (EGAC), title, status, and the number of data access requests associated with each DAC. Whereas the Policies section is where users can manage and view all their registered policies, along with their associated EGA accessions, links, and titles. A drop-down menu is available with shortcuts to the most useful pages in the top right corner. It is possible to check DACs and policies, as well as creating new ones. In this menu, users can also contact the Helpdesk team using the “Need Help?” section to make inquiries about DACs and policies. Comprehending the DACs list & making the most of all the features Colours are crucial in the DAC's section as they will indicate the user role and the status in every committee, namely: Yellow: you are the administrator of the DAC and have full control to add other contacts, modify the content, and manage data access requests. Orange: you are a member of the DAC and have the authority to manage data access requests but cannot make any changes to the content. Blue: you have been invited to join the DAC and need to either accept or decline the invitation. Grey: the DAC is currently pending validation by the EGA Helpdesk team. Red: the DAC has been declined by the EGA Helpdesk team. Discover more information on how to view the message from Helpdesk containing the declination reason or how to validate or reject a DAC invitation by taking the tour of the DAC Portal. How to create, register and edit a DAC Creating a DAC is an easy process that only requires users to complete the necessary information. After that, the EGA Helpdesk team will review the proposal and validate the proposal. Once the DAC is validated it is possible for authorised users with admin role to edit it at any time, ensuring that all information and contacts associated with the committee is accurate and up to date. Membership management is controlled by searching for the EGA user looking for their username, email, or organisation. Please, note that for a person's details to appear in this search engine, they must have registered as a EGA user. Data Access Committee administrators manage two types of permissions that apply to contacts: Main contact and Role. Both can be modified at any time: Main contact refers to the primary person we firstly contact for any inquiry about the DAC (please, note that we can contact all DAC members, and this refers to the person we will contact first). Role defines the actions that can be taken by each contact. There are two possible status, admin and member, details of which can be found in the Take the Tour. Keep in mind that, to save the modifications, it is always necessary to click on “Update”. Moreover, it is also possible to delete any contact in a DAC whenever it is needed. The data request process: how to manage access requests The data request process is now channelled through the EGA website. Once a user sends a request access, the petition will go directly to the DAC Portal profile of the DAC members liked to the requested dataset. Whenever a request is received, a sand clock symbol will be displayed next to the DAC responsible for the dataset involved. By clicking on the DAC, it is possible to discover more information about requester(s). In this section, DAC Portal users will also be able to modify the display of the data access requests. It is possible to group requests by dataset accession or username, depending on the user preferences. Additionally, it is possible to isolate specific requests by user or dataset. To accept a request, users must slide the button to the right. Once the button turns green, it signifies that the request is accepted; the requester will be granted access to the requested dataset. To make lives easier, we have implemented a “Select all” button to easily accept or revoke multiple requests at once. To decline requests, it is necessary to slide the button to the left. A red button means the requester will not be granted access to the dataset. The DAC Portal will always ask to submit the reason why the data access was denied. Please, note that any decision, grant or denial, must be confirmed clicking the "Apply" button. This will update the status of the request and notify the requester accordingly. Taking control of the activity in the DAC Portal: checking the requests history The History button is a useful feature that allows users to access a list of all the requests that have been granted. This is especially useful for auditing purposes, as it enables to keep track of who has been granted access to the data and when. To facilitate control for those users with several requests we have included a filtering option. It allows to filter requests by user, mail, dataset, EGA ID, and date (these are combinable to boost the search). Don't forget about policies: creation, management, and update Similarly to the DACs section, in the Policies one users can manage and view all their registered policies. By clicking on any policy, it is possible to view its details as well as making any necessary updates or modifications. The sheets symbol and its adjacent number indicate the quantity of objects using a policy. By clicking in the location symbol users will go directly to the DAC page on the DAC Portal to see more information about that DAC. DAC Portal users can create new policies. The first step requires to link the policy to a registered DAC. After defining the title, users can either add a link to an external URL containing terms & conditions or write the policy content directly on the DAC Portal. Keep in mind that both options can be included. Data Use Ontology (DUO) codes can be added to new policies. These codes are used to specify the permitted and prohibited uses of the data. Please, note that some of them require a modifier to provide additional specificity. Find out more by checking out this specific content on the DAC Portal Tour. Policies can be modified at any given time by clicking on one registered policy, which will display the current information.
Live Distribution Welcome to the documentation for using the Live Distribution feature for distributing data files securely through the EGA platform. This guide will walk you through the process of downloading encrypted files and decrypting them using Crypt4GH. Please follow the steps below to ensure a smooth experience. Before Downloading Create an EGA user. Make sure that you have the permissions to download the dataset of interest. In case you don’t have access, request access to the dataset. Add your Crypt4GH-compatible public key to your EGA account. Please allow a few hours for your public key to be synced with your profile. Afterwards, you will be able to connect to your EGA outbox using the SFTP protocol. Download Graphical User Interface (GUI) You can use any GUI that supports SFTP connections, such as FileZilla, an open-source FTP client. For Filezilla as your GUI, follow these steps to download files: Open FileZilla and access Site Manager (File > Site Manager). Create a new connection with the following settings (Figure 1): Protocol: SFTP - SSH File Transfer Protocol Host: __EGA_OUTBOX_DOMAIN__ Logon Type: Key file User: your EGA username Key file: path/to/your/private_key Figure 1: Process of establishing a new connection to __EGA_OUTBOX_DOMAIN__ using a key file as the logon method in FileZilla. The figure showcases the FileZilla version 3.52.2 operating on IOS v11.2.3. By following the depicted steps, users can create a secure and efficient connection to the EGA outbox, ensuring seamless data transfers. Click Connect to access your Outbox. This folder serves as your storage space within the EGA cloud, containing files accessible for download in a secure way. Browse the remote directory on the right side of the FileZilla screen. Select the files you wish to download, right-click, and choose Download (Figure 2). Figure 2: Step-by-step process of manually downñoad files from __EGA_OUTBOX_DOMAIN__ using FileZilla, with FileZilla version 3.52.2 operating on IOS v11.2.3. The figure demonstrates how users can downñoad data from the EGA outbox to their local storage by following the depicted steps SFTP command line You can also use the SFTP command line to securely download files from the EGA Outbox. Using SFTP command line client in Linux/Unix Open a terminal window Enter the following command to connect: sftp username@hostname Enter your EGA password To see a list of available sftp commands type help sftp> put – Upload file sftp> get – Download file sftp> cd path – Change remote directory to ‘path’ sftp> pwd – Display remote working directory sftp> lcd path – Change the local directory to ‘path’ sftp> lpwd – Display local working directory sftp> ls – Display the contents of the remote working directory sftp> lls – Display the contents of the local working directory Type get command to download files. For example: get encrypted_file.c4gh Use the bye command to close the connection (SFTP session). Convenient SSH settings Include the following settings in your SSH config file, located in ~/.ssh/config Host __EGA_OUTBOX_DOMAIN__ EGA-outbox hostname __EGA_OUTBOX_DOMAIN__ User username IdentityFile path/to/the/private/key Replace username and path/to/the/private/key with the appropriate settings, and you will be able to connect to the __EGA_OUTBOX_DOMAIN__ simply using sftp EGA-outbox. How to decrypt Files archived at the EGA are encrypted based on Crypt4GH. Hence, to decrypt the files you need to install Crypt4GH. You can install a python implementation of it, with pip install crypt4gh or directly from the Github repository pip install git+https://github.com/EGA-archive/crypt4gh.git After installing Crypt4GH, decrypt files using the following command: crypt4gh decrypt --sk /path/private/key < encrypted_file.c4gh > decrypted_filename The command reads the encrypted file from stdin (with <) and output the decrypted version to stdout (with >). Replace encrypted_file.c4gh and decrypted_filename with the appropriate filenames but make sure to not use the same filename for both reading and writing because your SHELL would then truncate both files before you even read or write. Frequently Asked Questions What username should I use to log in to my outbox? The authentication process for logging in to the EGA website, as well as accessing your inbox and outbox, requires the use of your username, not your email address. Therefore, if you registered a username different from your email address when creating your EGA account, you must use that username to log in. If you have forgotten your registered username, please, contact our Helpdesk team for assistance. I see that some files in my dataset have 'unavailable' as extension. What should I do? Within your Outbox, you'll find a list of all the datasets available for download. Occasionally, certain files may be marked as "unavailable". These unavailable files can be identified by the "unavailable" extension added to their filenames (e.g. filename.fastq.gz.unavailable.c4gh). If you encounter an unavailable file that you need, please reach out to our Helpdesk. We'll promptly work on making the file accessible for download as soon as possible. Specific to using keys Can I access one EGA account from different devices? Yes, you can access your account from different devices by linking several public keys to your EGA account. Each device can generate a unique public-private key pair, and the corresponding public keys can be linked to the same account. This way, you can use different public keys on different devices and still have access to the same account and data. I have several keys and I don't remember which one is which When generating SSH keys, it's a good practice to add a comment using the -C flag. This will allow you to add a descriptive tag to your key, making it easier to identify later on. Here's an example command that generates an SSH key with a comment: ssh-keygen -t ed25519 -C work-pass In this example, we're generating an ed25519 SSH key with the comment work-pass. Once you have multiple keys with different comments, you can use the comments to easily identify each key. To view the comments for your existing SSH keys, you can use the following command: ssh-keygen -l -f /path/to/key This will display the key fingerprint and the associated comment. By checking the comments, you should be able to identify which key is which. What if I can't find my SSH keys for uploading files with a key file, and how can I use new keys? If you can't find your SSH keys, don't worry - you can make new ones. To do this, open your terminal or command prompt and type a command to make a new SSH key. You can pick a name for the key, and choose a password to keep it safe. After making the key, you can add the new key to your account or server where you want to upload files using the key file. This usually involves copying and pasting the key's "public" (e.g. file.pub) part to the right place. If you lose track of the key again, just make a new one and add it again. Keep in mind that SSH keys belong to you and your computer, so if you switch computers or accounts, you'll need to make new keys. I don't want to type the passphrase every time I use the key. What can I do? You can use an ssh-agent to avoid typing the passphrase every time you use the key. An ssh-agent is a program that stores your private keys in memory and provides them to ssh when needed. You can add your key to the ssh-agent using the command ssh-add followed by the path to your key file.Here's an example of the steps to follow: Open a terminal window.Start the ssh-agent by typing the command eval $(ssh-agent).Add your key to the ssh-agent by typing the command ssh-add [key filepath]. For instance, if your key file is located in the home directory with the name mykey, the command will look like this: ssh-add ~/mykey After adding your, key to the ssh-agent, you should be able to use ssh without having to enter your passphrase every time. Can I use my password for authentication (without my private key)? If you prefer to use your username and password for authentication instead of your private key, you can still do so. When using a Graphical User Interface (GUI) such as FileZilla, you can select Ask for password as your Logon Type (Figure 3). This option will prompt you to enter your password when you click Connect, instead of using your private key. Figure 3: This option will prompt you to enter your password when you click "Connect", instead of using your private key. Figure 3: Process of establishing a new connection to __EGA_OUTBOX_DOMAIN__ using your password as the logon method in FileZilla. The figure showcases the FileZilla version 3.52.2 operating on IOS v11.2.3. By following the depicted steps, users can create a secure and efficient connection to the inbox, ensuring seamless data transfers. It's worth noting that using a password for authentication can be less secure than using an SSH key, as passwords can be more easily compromised through various means. However, if you choose to use your password for authentication, selecting "Ask for password" as your Logon Type is a good way to do so securely via a GUI. Why is it better to use my key and not my password? SSH keys for authentication is generally considered to be more secure and convenient than using passwords. SSH keys are more difficult to crack than passwords, and they can be restricted to specific users and machines, giving you more control over access. Once you set up your SSH keys, you can use them to authenticate quickly and easily, without having to enter a password every time. This makes automation of tasks, such as uploading encrypted files, much simpler. Additionally, SSH keys provide better logging, allowing you to keep track of who is accessing your systems and when. All in all, using SSH keys is a good practice for improving security and convenience in your authentication process.
What is a DAC? Given the complexity, scale, and diversity of global submitters and studies, the EGA operates a distributed data access model in which requests are made to the data controller, not to the EGA. The European Commission defines a data controller, in the General Data Protection Regulation (GDPR), as the person that determines the purposes for which and the means by which personal data is processed. A Data Access Committees, commonly referred to as DACs, comprise on or more individuals (or data controllers) that review data access requests and make decisions on who can access personally identifiable genetic, phenotypic, and clinical data deposited at the EGA. Therefore, the members of a DAC should be individuals who have the authority to approve data access requests. The animation describes how you can authorise access to your sensitive data with the help of Data Access Committee and Authorisation tool. Acknowledgement to CSC - IT Center for Finland, Elixir Finland, Elixir Europe. Frequently Asked Questions How can I create a Data Access Committee? How can I create a Data Access Committee? The members of a DAC can come from different areas of expertise, such as data management, data analysis, information technology, legal and compliance, subject matter experts, privacy and security, and representatives from the organisations or individuals that provide data to the DAC. The specific members of a DAC can vary depending on the needs of the organisation and the type of data being managed. The EGA strongly suggests checking with your organisation to align with its regulations How should a DAC be named? The chosen name must be informative to the applicant. For example, internal identifiers, such as grant numbers, should not be used. Individual PI names should also not be used. DAC's are often named after the organisation or department of the data source. Browse the full list of DAC names currently in the EGA. How can I become an EGA DAC contact? To register a DAC at the EGA you must create first as an EGA user. Once your EGA user has been approved by the Helpdesk team, you will be able to log in to the DAC Portal. How can I register a DAC? To register a DAC, follow the DAC Portal instructions. You will be required to provide a DAC name, name of the individual(s) that make up your DAC and contact details for your DAC including your Institutional email(s). Wherever possible, the DAC should make sure that all points of contact are readily available and able to answer any initial data requests/queries in < 2 weeks. Once your DAC is registered, you will have to wait upon the validation from our Helpdesk team. As soon as all the validations have been completed, your DAC will be activated. Alternatively, you can also establish a DAC at the EGA during a programmatic submission through Webin API. Which are the possible roles of a DAC contact? There are two possible roles for DAC contacts: member and admin. An admin has additional privileges compared to a member: An “admin” can manage data requests, create and edit policies, edit the content of the DAC, add or remove contacts, and decide the role of each contact. A “member” can manage data requests and create policies. However, a member does not have permission to modify DAC details, edit information from policies where they are not admins, or add/remove contacts. There is no limit to the number of admins in a DAC, and each admin is responsible for deciding who should have editing privileges. This allows for a more decentralised and democratic approach to managing the DAC. How can I modify the information of a DAC? To modify a DAC, follow the instructions here. Keep in mind that only DAC contact with an admin role can modify the information of a DAC. If your DAC was registered before the lauch of the DAC Portal, and its ID is EGAC0 (not EGAC5), you must use the programmatic submission to modify it. Please, do not hesitate to contact our Helpdesk team if you need help with this! To prevent potential data breaches and ensure adherence to GDPR regulations, it is essential that the European Genome-Phenome Archive (EGA) is informed via the Helpdesk team of any changes to the Data Access Committee (DAC). This should be done in addition to any changes being made on the DAC portal. Data Controllers (as per the definition in the DPA) are also responsible for notifying the previous DAC of any modifications. Without proper notification, changes might not be automatically updated in our system, leading to the risk of incorrect permissions being applied and potential data access issues. Therefore, it is imperative that all Data Controllers follow this protocol to maintain data integrity and security. What’s the link between DAC, policy and dataset? A dataset is linked to one single policy. At the same time, one policy has a one to one relationship with a DAC. In this example, you can see that in this dataset page, we are only showing the information of one DAC (1 dataset - 1 DAC). However, the ratio of objects does not work the same in the other direction. One DAC can own multiple policy objects. And each policy object can be reused in several datasets. Thus, one DAC can manage one or more datasets. In this example, you can see that in this DAC page, we are showing all the datasets that are managed by one DAC (1 DAC - >400 datasets). EGA Data Access Committee Best Practices Which are the EGA DAC best practices? Refer to DAC Best Practices What happens if a DAC member changes institutions? EGA is committed to the protection and ownership of the data stored in our systems. We respect the institution's ownership of the data, and as such, if a DAC member changes institutions, the ownership of the data will not be transferred to the new institution. Therefore, before changing institutions, we request that the DAC contact add a new member who will replace them once they no longer work at the institution. This ensures that the data remains protected and is accessible to authorised personnel at the institution. To prevent potential data breaches and ensure adherence to GDPR regulations, it is essential that the European Genome-Phenome Archive (EGA) is informed via the Helpdesk team of any changes to the Data Access Committee (DAC). This should be done in addition to any changes being made on the DAC portal. Data Controllers (as per the definition in the DPA) are also responsible for notifying the previous DAC of any modifications. Without proper notification, changes might not be automatically updated in our system, leading to the risk of incorrect permissions being applied and potential data access issues. Therefore, it is imperative that all Data Controllers follow this protocol to maintain data integrity and security. What happens if EGA detects an unresponsive DAC? EGA defines an unresponsive DAC as a DAC with one or more contacts who do not respond to data access requests. EGA has procedures in place to identify these types of DACs, escalate the issue, and attempt to reassign the DAC to a responsive contact. This is a crucial step in ensuring that data can be accessed and utilised by researchers. If EGA identifies an unresponsive DAC, the organisation will first try to resolve the issue by escalating it to the appropriate parties. This may involve attempting to reassign the DAC to a more responsive contact. Unfortunately, in situations where we cannot reassign the DAC, the dataset will be withdrawn from the public website and the files will be removed from our system. If an EGA ID is referenced in a publication, the EGA will take extra steps to ensure that the public is made aware of the data's unavailability. I don't want to receive an email notification for pending requests. How can I do that? If you are an EGA DAC with pending requests, you will always receive emails for new data access requests. However, the EGA understands that a request resolution can take some time, for this reason, if you add a comment (make sure you save it by clicking the APPLY button!) we will filter those requests at the time of sending the notification! How can I manage data access requests? What documentation does the DAC need to provide? Each dataset that is submitted to the EGA must be linked to a policy object. The policy is a Data Access Agreement (DAA), which defines the terms and conditions of using the dataset, such as how the data files should be stored once downloaded or details of publication embargoes that should be observed by the approved user. As part of the Data Access Agreement, information regarding the application can be captured to help inform the DAC when making its decision. For example, requestors could be asked to provide a proposed title for their research and a proposal of how the data will be used. By asking for provision of such information the DAC can be assured that the requestor fully understands any consents associated with the data. It is important that accounts created at the EGA, are created solely for those individuals that will be downloading the data from the EGA. As part of the data access request, we strongly encourage you to identify individuals that will need an account at the EGA in order to prevent sharing of login details, which is strictly prohibited under EGA user account policy. Such information can easily be captured in the DAA. NOTICE The data access agreement template below is provided for guidance only and should be adapted as you see fit to suit your own purpose. In the interest of promoting data sharing, we suggest that if an agreement cannot be met around clause 19 in this example that both parties should agree to remain silent, and that the clause should be removed from the agreement. Example DAA How can the DAC provide the DAA? The DAC should provide their own DAA when registering a policy. Data requestors will download this document and should fill it in and send it back to the DAC. Data access decisions should be based on such documentation. The DAA can be downloaded through the request data webpage. Once it has been filled in, the signed copy of the DAA can be uploaded back to the request data webpage and sent to the DAC for review. How can I grant access to the data? Once you receive a data access request, you can login to the DAC Portal. In this portal you will see all your pending requests and will be able to grant or decline access to the requestors. I am a member of the Data Access Committee. Could I approve somebody else to deal with the requests on my behalf? If you want to delegate data access decisions to someone else, make sure that the individual's account is officially registered as a member of the DAC. Remember that a DAC contact with an "admin" role can always add new members to an existing DAC, remove members, and modify contact details through the DAC Portal. Can I automatise the process of managing data access requests? The answer is yes! You can use a programmatic approach using our DAC API! Check out the DAC API specification! Data Breach What should a DAC do if they suspect a breach? If a DAC suspects a data breach of one or more of their datasets, they should immediately contact the EGA Helpdesk team at this link. The DAC must provide the following information when contacting the EGA Helpdesk team: A list of affected datasets An estimated date of the data breach (or interval of dates) A list of unauthorised users who accessed the data (if available). Otherwise, they can provide a list of authorised users for the affected datasets Any observations they would like to raise to the EGA team Once the DAC has contacted the EGA team, we will respond within 48 hours (please allow some leeway during peak times) and activate our data breach protocol. What can I expect from the EGA if they detect a breach? Once the EGA determines that a security incident has occurred, we will notify all DAC members that a data breach has been detected, and take steps to contain the incident. Containment approaches may include: Revoking a data provider's access to the EGA resources, such as by changing passwords. Removing affected EGA datasets from distribution, such as by withdrawing a dataset. Disabling certain functions or services, such as the EGA ingestion pipeline. Shutting down the system or disconnecting it from the network. After the incident has been contained, the EGA will determine whether it is necessary to eradicate components related to the incident. Finally, the EGA will enable recovery of the service to normal operation and confirm that all services are functioning normally.
Uploading files Users who hold an ega-box-XXX account can upload files using either INBOX or FTP. Users who have a Submitter role associated with their email will only be able to upload files using INBOX. Before uploading your files, please make sure that any files that will be uploaded to EGA do not use special characters in their naming convention, such as # ? ( ) [ ] / \ = + < > : ; " ' , * ^ | &. This can cause issues with the archiving process, leading to problems for end users. The EGA is a shared, public service with limited storage. To manage the available resources, we enforce a limit of 10TB per submission account at any one time. If you exceed this limit, a “permission denied” message will be displayed. This will prevent you from uploading more files, but connecting to your inbox.For submissions larger than 10TB, please perform uploads in 10TB batches: register all the metadata and then finalise the submission. Upload the next batch of files and repeat the same metadata registration and finalisation process until you have completed the file upload. Further information can be found in the SP documentation. INBOX FTP The INBOX is only compatible with files encrypted using the Crypt4gh tool Before uploading If you are not a registered EGA user, you will first need an EGA user account. Please note that it may take a few days for your account to be activated, as it needs to be vouched for by the EGA Helpdesk. Once your account is validated, you will be able to request a submitter role. [Optional] Meanwhile, you can create and add your public key to your EGA account profile. This option is not available for old submission accounts (e.g., ega-box-NNN). As soon as you have been granted a submitter role, you will be able to connect with your username and password to the EGA inbox using the SFTP protocol. If you have also registered a public key in your profile, you can also connect using this key. To upload files to your account, you can use the graphical user interface (GUI) or the command line. Graphical User Interface (GUI)We recommend using FileZilla, a free, open-source FTP client. However, you can use any other GUI that allows connecting over the SFTP protocol. For FileZilla as your GUI, follow these steps to upload files: Create a new connection in Site Manager (File > Site Manager) and select the following options (Figure 1): Protocol: SFTP - SSH File Transfer ProtocolHost: __EGA_INBOX_DOMAIN__Logon Type: Key fileUser: your EGA usernameKey file: Path/to/your/private_keyFigure 1: Process of establishing a new connection to __EGA_INBOX_DOMAIN__ using a key file as the logon method in FileZilla. The figure showcases the FileZilla version 3.52.2 operating on IOS v11.2.3. By following the depicted steps, users can create a secure and efficient connection to the inbox, ensuring seamless data transfers.Click Connect, and you will log in remotely to your home directory. You can think of this folder as a storage "in the EGA cloud" in which you will add your files for the EGA. The uploading area has three folders:To-encrypt: Files uploaded in this folder will be encrypted automatically on the fly.Encrypted: Files uploaded in this folder must already be encrypted with Crypt4gh. Upload your files here if your connection is unstable or you have problems completing the upload into-encrypt.Etc: This folder contains two files that allow the server to show you your username and group instead of some internal numbers. Please do not upload files here; otherwise, you will obtain a permission denied error. Find the files you want to upload by browsing your local storage (left side of your screen in FileZilla). Select all the files you want to upload, then right-click on them and select Upload (Figure 2). Figure 2: Step-by-step process of manually uploading files to __EGA_INBOX_DOMAIN__ using FileZilla, with FileZilla version 3.52.2 operating on IOS v11.2.3. The figure demonstrates how users can transfer data from their local storage to the "EGA cloud" by following the depicted steps Please note that regardless of which folder you upload your files in, both folders (to-encrypt, encrypted) will point to the same path (/) (Figure 3). Therefore, you will see your files in both folders. Figure 3: Both folders, to-encrypt and encrypted, point to the same path (/)" If your connection is unstable, please encrypt your files first using Crypt4gh. Then upload them to the ‘encrypted’ folder. The example above shows how to connect to __EGA_INBOX_DOMAIN__ using the private key. However, if you prefer to log in using your credentials, you can do so. Please go to the Frequently Asked Questions (FAQs) for more information. SFTP command line To upload files securely to your private area of the EGA, you can use SFTP(Secure File Transfer Protocol) with your favorite FTP client. Here's what you need to know to get started: Connect to the target host __EGA_INBOX_DOMAIN__. This is the new hostname for the EGA SFTP service. Log in with your EGA username and key files (or password). Upload files to your private EGA inbox to ensure that only you can access the files. By following these steps, you can securely upload your files to the EGA for safe storage and sharing. Using the SFTP command line client in Linux/Unix Open a terminal and type sftp username@hostnameEnter your EGA passwordTo see a list of available SFTP commands, type helpsftp> put – Upload filesftp> get – Download filesftp> cd path – Change remote directory to ‘path’sftp> pwd – Display remote working directorysftp> lcd path – Change the local directory to ‘path’sftp> lpwd – Display local working directorysftp> ls – Display the contents of the remote working directorysftp> lls – Display the contents of the local working directoryType the "put" command to upload files. For example: put *.bamUse the bye command to close the connection (SFTP session). After uploading- Once you have uploaded files to the inbox, please bear in mind that the checksum needs to be calculated, which can take up to two days. You will only be able to link your files to a run/analysis once the encrypted checksum has been calculated.- When linking your files to the 'Run' or 'Analysis', ensure that the file name matches the file path '/name' in the INBOX folder.- Please delete the files from your SFTP INBOX after all the runs/analyses have been registered and files are ingested (SP > Files > Files ingested). This will clear your inbox space an allow you to upload more files. This will also prevent the files from reappearing in your Submitter Portal inbox. Frequently Asked Questions Specific to the inbox What username should I use to log in to my inbox? The authentication process for logging in to the EGA website, as well as accessing your inbox and outbox, requires the use of your username. If you have forgotten your registered username, please contact our Helpdesk team for assistance. How are checksums calculated in your inbox? If you encrypt the file beforehand and upload it to the "encrypted" folder, the unencrypted checksum will not be calculated until the file is ingested (i.e., until it is used in a run/analysis). If the file is uploaded to the "to-encrypt" folder, then both checksums are calculated.Please bear in mind that after files have been uploaded to the inbox, the checksum must be calculated, which can take from a few hours to two days. Specific to using keys to authenticate Can I access one EGA account from different devices? Yes, you can access your account from different devices by linking several public keys to your EGA account. Each device can generate a unique public-private key pair, and the corresponding public keys can be linked to the same account. This way, you can use different public keys on different devices and still have access to the same account and data. I have several keys and I don't remember which one is which When generating SSH keys, it's a good practice to add a comment using the -C flag. This will allow you to add a descriptive tag to your key, making it easier to identify later on. Here's an example command that generates an SSH key with a comment: ssh-keygen -t ed25519 -C work-pass In this example, we're generating an ed25519 SSH key with the comment work-pass. Once you have multiple keys with different comments, you can use the comments to easily identify each key. To view the comments for your existing SSH keys, you can use the following command: ssh-keygen -l -f /path/to/key This will display the key fingerprint and the associated comment. By checking the comments, you should be able to identify which key is which. What if I can't find my SSH keys for uploading files with a key file, and how can I use new keys? If you can't find your SSH keys, don't worry - you can make new ones. To do this, open your terminal or command prompt and type a command to make a new SSH key. You can pick a name for the key, and choose a password to keep it safe. After making the key, you can add the new key to your account or server where you want to upload files using the key file. This usually involves copying and pasting the key's "public" (e.g. file.pub) part to the right place. If you lose track of the key again, just make a new one and add it again. Keep in mind that SSH keys belong to you and your computer, so if you switch computers or accounts, you'll need to make new keys. I don't want to type the passphrase every time I use the key. What can I do? You can use an ssh-agent to avoid typing the passphrase every time you use the key. An ssh-agent is a program that stores your private keys in memory and provides them to ssh when needed. You can add your key to the ssh-agent using the command ssh-add followed by the path to your key file.Here's an example of the steps to follow: Open a terminal window.Start the ssh-agent by typing the command eval $(ssh-agent).Add your key to the ssh-agent by typing the command ssh-add [key filepath]. For instance, if your key file is located in the home directory with the name mykey, the command will look like this: ssh-add ~/mykey After adding your, key to the ssh-agent, you should be able to use ssh without having to enter your passphrase every time. Can I use my password for authentication (without my private key)? If you prefer to use your username and password for authentication instead of your private key, you can still do so. When using a Graphical User Interface (GUI) such as FileZilla, you can select Ask for password as your Logon Type (Figure 3). This option will prompt you to enter your password when you click Connect, instead of using your private key. Figure 3: This option will prompt you to enter your password when you click "Connect", instead of using your private key. Figure 3: Process of establishing a new connection to __EGA_INBOX_DOMAIN__ using your password as the logon method in FileZilla. The figure showcases the FileZilla version 3.52.2 operating on IOS v11.2.3. By following the depicted steps, users can create a secure and efficient connection to the inbox, ensuring seamless data transfers. It's worth noting that using a password for authentication can be less secure than using an SSH key, as passwords can be more easily compromised through various means. However, if you choose to use your password for authentication, selecting "Ask for password" as your Logon Type is a good way to do so securely via a GUI. Why is it better to use my key and not my password? SSH keys for authentication is generally considered to be more secure and convenient than using passwords. SSH keys are more difficult to crack than passwords, and they can be restricted to specific users and machines, giving you more control over access. Once you set up your SSH keys, you can use them to authenticate quickly and easily, without having to enter a password every time. This makes automation of tasks, such as uploading encrypted files, much simpler. Additionally, SSH keys provide better logging, allowing you to keep track of who is accessing your systems and when. All in all, using SSH keys is a good practice for improving security and convenience in your authentication process.
Original description of the study: From ELLIPSE (linked to the PRACTICAL consortium), we contributed ~78,000 SNPs to the OncoArray. A large fraction of the content was derived from the GWAS meta-analyses in European ancestry populations (overall and aggressive disease; ~27K SNPs). We also selected just over 10,000 SNPs from the meta-analyses in the non-European populations, with a majority of these SNPs coming from the analysis of overall prostate cancer in African ancestry populations as well as from the multiethnic meta-analysis. A substantial fraction of SNPs (~28,000) were also selected for fine-mapping of 53 loci not included in the common fine-mapping regions (tagging at r2>0.9 across ±500kb regions). We also selected a few thousand SNPs related with PSA levels and/or disease survival as well as SNPs from candidate lists provided by study collaborators, as well as from meta-analyses of exome SNP chip data from the Multiethnic Cohort and UK studies. The Contributing Studies: Aarhus: Hospital-based, Retrospective, Observational. Source of cases: Patients treated for prostate adenocarcinoma at Department of Urology, Aarhus University Hospital, Skejby (Aarhus, Denmark). Source of controls: Age-matched males treated for myocardial infarction or undergoing coronary angioplasty, but with no prostate cancer diagnosis based on information retrieved from the Danish Cancer Register and the Danish Cause of Death Register. AHS: Nested case-control study within prospective cohort. Source of cases: linkage to cancer registries in study states. Source of controls: matched controls from cohort ATBC: Prospective, nested case-control. Source of cases: Finnish male smokers aged 50-69 years at baseline. Source of controls: Finnish male smokers aged 50-69 years at baseline BioVu: Cases identified in a biobank linked to electronic health records. Source of cases: A total of 214 cases were identified in the VUMC de-identified electronic health records database (the Synthetic Derivative) and shipped to USC for genotyping in April 2014. The following criteria were used to identify cases: Age 18 or greater; male; African Americans (Black) only. Note that African ancestry is not self-identified, it is administratively or third-party assigned (which has been shown to be highly correlated with genetic ancestry for African Americans in BioVU; see references). Source of controls: Controls were identified in the de-identified electronic health record. Unfortunately, they were not age matched to the cases, and therefore cannot be used for this study. Canary PASS: Prospective, Multi-site, Observational Active Surveillance Study. Source of cases: clinic based from Beth Israel Deaconness Medical Center, Eastern Virginia Medical School, University of California at San Francisco, University of Texas Health Sciences Center San Antonio, University of Washington, VA Puget Sound. Source of controls: N/A CCI: Case series, Hospital-based. Source of cases: Cases identified through clinics at the Cross Cancer Institute. Source of controls: N/A CerePP French Prostate Cancer Case-Control Study (ProGene): Case-Control, Prospective, Observational, Hospital-based. Source of cases: Patients, treated in French departments of Urology, who had histologically confirmed prostate cancer. Source of controls: Controls were recruited as participating in a systematic health screening program and found unaffected (normal digital rectal examination and total PSA < 4 ng/ml, or negative biopsy if PSA > 4 ng/ml). COH: hospital-based cases and controls from outside. Source of cases: Consented prostate cancer cases at City of Hope. Source of controls: Consented unaffected males that were part of other studies where they consented to have their DNA used for other research studies. COSM: Population-based cohort. Source of cases: General population. Source of controls: General population CPCS1: Case-control - Denmark. Source of cases: Hospital referrals. Source of controls: Copenhagen General Population Study CPCS2: Source of cases: Hospital referrals. Source of controls: Copenhagen General Population Study CPDR: Retrospective cohort. Source of cases: Walter Reed National Military Medical Center. Source of controls: Walter Reed National Military Medical Center ACS_CPS-II: Nested case-control derived from a prospective cohort study. Source of cases: Identified through self-report on follow-up questionnaires and verified through medical records or cancer registries, identified through cancer registries or the National Death Index (with prostate cancer as the primary cause of death). Source of controls: Cohort participants who were cancer-free at the time of diagnosis of the matched case, also matched on age (±6 mo) and date of biospecimen donation (±6 mo). EPIC: Case-control - Germany, Greece, Italy, Netherlands, Spain, Sweden, UK. Source of cases: Identified through record linkage with population-based cancer registries in Italy, the Netherlands, Spain, Sweden and UK. In Germany and Greece, follow-up is active and achieved through checks of insurance records and cancer and pathology registries as well as via self-reported questionnaires; self-reported incident cancers are verified through medical records. Source of controls: Cohort participants without a diagnosis of cancer EPICAP: Case-control, Population-based, ages less than 75 years at diagnosis, Hérault, France. Source of cases: Prostate cancer cases in all public hospitals and private urology clinics of département of Hérault in France. Cases validation by the Hérault Cancer Registry. Source of controls: Population-based controls, frequency age matched (5-year groups). Quotas by socio-economic status (SES) in order to obtain a distribution by SES among controls identical to the SES distribution among general population men, conditionally to age. ERSPC: Population-based randomized trial. Source of cases: Men with PrCa from screening arm ERSPC Rotterdam. Source of controls: Men without PrCa from screening arm ERSPC Rotterdam ESTHER: Case-control, Prospective, Observational, Population-based. Source of cases: Prostate cancer cases in all hospitals in the state of Saarland, from 2001-2003. Source of controls: Random sample of participants from routine health check-up in Saarland, in 2000-2002 FHCRC: Population-based, case-control, ages 35-74 years at diagnosis, King County, WA, USA. Source of cases: Identified through the Seattle-Puget Sound SEER cancer registry. Source of controls: Randomly selected, age-frequency matched residents from the same county as cases Gene-PARE: Hospital-based. Source of cases: Patients that received radiotherapy for treatment of prostate cancer. Source of controls: n/a Hamburg-Zagreb: Hospital-based, Prospective. Source of cases: Prostate cancer cases seen at the Department of Oncology, University Hospital Center Zagreb, Croatia. Source of controls: Population-based (Croatia), healthy men, older than 50, with no medical record of cancer, and no family history of cancer (1st & 2nd degree relatives) HPFS: Nested case-control. Source of cases: Participants of the HPFS cohort. Source of controls: Participants of the HPFS cohort IMPACT: Observational. Source of cases: Carriers and non-carriers (with a known mutation in the family) of the BRCA1 and BRCA2 genes, aged between 40 and 69, who are undergoing prostate screening with annual PSA testing. This cohort has been diagnosed with prostate cancer during the study. Source of controls: Carriers and non-carriers (with a known mutation in the family) of the BRCA1 and BRCA2 genes, aged between 40 and 69, who are undergoing prostate screening with annual PSA testing. This cohort has not been diagnosed with prostate cancer during the study. IPO-Porto: Hospital-based. Source of cases: Early onset and/or familial prostate cancer. Source of controls: Blood donors Karuprostate: Case-control, Retrospective, Population-based. Source of cases: From FWI (Guadeloupe): 237 consecutive incident patients with histologically confirmed prostate cancer attending public and private urology clinics; From Democratic Republic of Congo: 148 consecutive incident patients with histologically confirmed prostate cancer attending the University Clinic of Kinshasa. Source of controls: From FWI (Guadeloupe): 277 controls recruited from men participating in a free systematic health screening program open to the general population; From Democratic Republic of Congo: 134 controls recruited from subjects attending the University Clinic of Kinshasa KULEUVEN: Hospital-based, Prospective, Observational. Source of cases: Prostate cancer cases recruited at the University Hospital Leuven. Source of controls: Healthy males with no history of prostate cancer recruited at the University Hospitals, Leuven. LAAPC: Subjects were participants in a population-based case-control study of aggressive prostate cancer conducted in Los Angeles County. Cases were identified through the Los Angeles County Cancer Surveillance Program rapid case ascertainment system. Eligible cases included African American, Hispanic, and non-Hispanic White men diagnosed with a first primary prostate cancer between January 1, 1999 and December 31, 2003. Eligible cases also had (a) prostatectomy with documented tumor extension outside the prostate, (b) metastatic prostate cancer in sites other than prostate, (c) needle biopsy of the prostate with Gleason grade ≥8, or (d) needle biopsy with Gleason grade 7 and tumor in more than two thirds of the biopsy cores. Eligible controls were men never diagnosed with prostate cancer, living in the same neighborhood as a case, and were frequency matched to cases on age (± 5 y) and race/ethnicity. Controls were identified by a neighborhood walk algorithm, which proceeds through an obligatory sequence of adjacent houses or residential units beginning at a specific residence that has a specific geographic relationship to the residence where the case lived at diagnosis. Malaysia: Case-control. Source of cases: Patients attended the outpatient urology or uro-onco clinic at University Malaya Medical Center. Source of controls: Population-based, age matched (5-year groups), ascertained through electoral register, Subang Jaya, Selangor, Malaysia MCC-Spain: Case-control. Source of cases: Identified through the urology departments of the participating hospitals. Source of controls: Population-based, frequency age and region matched, ascertained through the rosters of the primary health care centers MCCS: Nested case-control, Melbourne, Victoria. Source of cases: Identified by linkage to the Victorian Cancer Registry. Source of controls: Cohort participants without a diagnosis of cancer MD Anderson: Participants in this study were identified from epidemiological prostate cancer studies conducted at the University of Texas MD Anderson Cancer Center in the Houston Metropolitan area. Cases were accrued in the Houston Medical Center and were not restricted with respect to Gleason score, stage or PSA. Controls were identified via random-digit-dialing or among hospital visitors and they were frequency matched to cases on age and race. Lifestyle, demographic, and family history data were collected using a standardized questionnaire. MDACC_AS: A prospective cohort study. Source of cases: Men with clinically organ-confined prostate cancer meeting eligibility criteria for a prospective cohort study of active surveillance at MD Anderson Cancer Center. Source of controls: N/A MEC: The Multiethnic Cohort (MEC) is comprised of over 215,000 men and women recruited from Hawaii and the Los Angeles area between 1993 and 1996. Between 1995 and 2006, over 65,000 blood samples were collected from participants for genetic analyses. To identify incident cancer cases, the MEC was cross-linked with the population-based Surveillance, Epidemiology and End Results (SEER) registries in California and Hawaii, and unaffected cohort participants with blood samples were selected as controls MIAMI (WFPCS): Prostate cancer cases and controls were recruited from the Departments of Urology and Internal Medicine of the Wake Forest University School of Medicine using sequential patient populations as described previously (PMID:15342424). All study subjects received a detailed description of the study protocol and signed their informed consent, as approved by the medical center's Institutional Review Board. The general eligibility criteria were (i) able to comprehend informed consent and (ii) without previously diagnosed cancer. The exclusion criteria were (i) clinical diagnosis of autoimmune diseases; (ii) chronic inflammatory conditions; and (iii) infections within the past 6 weeks. Blood samples were collected from all subjects. MOFFITT: Hospital-based. Source of cases: clinic based from Moffitt Cancer Center. Source of controls: Moffitt Cancer Center affiliated Lifetime cancer screening center NMHS: Case-control, clinic based, Nashville TN. Source of cases: All urology clinics in Nashville, TN. Source of controls: Men without prostate cancer at prostate biopsy. PCaP: The North Carolina-Louisiana Prostate Cancer Project (PCaP) is a multidisciplinary population-based case-only study designed to address racial differences in prostate cancer through a comprehensive evaluation of social, individual and tumor level influences on prostate cancer aggressiveness. PCaP enrolled approximately equal numbers of African Americans and Caucasian Americans with newly-diagnosed prostate cancer from North Carolina (42 counties) and Louisiana (30 parishes) identified through state tumor registries. African American PCaP subjects with DNA, who agreed to future use of specimens for research, participated in OncoArray analysis. PCMUS: Case-control - Sofia, Bulgaria. Source of cases: Patients of Clinic of Urology, Alexandrovska University Hospital, Sofia, Bulgaria, PrCa histopathologically confirmed. Source of controls: 72 patients with verified BPH and PSA<3,5; 78 healthy controls from the MMC Biobank, no history of PrCa PHS: Nested case-control. Source of cases: Participants of the PHS1 trial/cohort. Source of controls: Participants of the PHS1 trial/cohort PLCO: Nested case-control. Source of cases: Men with a confirmed diagnosis of prostate cancer from the PLCO Cancer Screening Trial. Source of controls: Controls were men enrolled in the PLCO Cancer Screening Trial without a diagnosis of cancer at the time of case ascertainment. Poland: Case-control. Source of cases: men with unselected prostate cancer, diagnosed in north-western Poland at the University Hospital in Szczecin. Source of controls: cancer-free men from the same population, taken from the healthy adult patients of family doctors in the Szczecin region PROCAP: Population-based, Retrospective, Observational. Source of cases: Cases were ascertained from the National Prostate Cancer Register of Sweden Follow-Up Study, a retrospective nationwide cohort study of patients with localized prostate cancer. Source of controls: Controls were selected among men referred for PSA testing in laboratories in Stockholm County, Sweden, between 2010 and 2012. PROGReSS: Hospital-based, Prospective, Observational. Source of cases: Prostate cancer cases from the Hospital Clínico Universitario de Santiago de Compostela, Galicia, Spain. Source of controls: Cancer-free men from the same population ProMPT: A study to collect samples and data from subjects with and without prostate cancer. Retrospective, Experimental. Source of cases: Subjects attending outpatient clinics in hospitals. Source of controls: Subjects attending outpatient clinics in hospitals ProtecT: Trial of treatment. Samples taken from subjects invited for PSA testing from the community at nine centers across United Kingdom. Source of cases: Subjects who have a proven diagnosis of prostate cancer following testing. Source of controls: Identified through invitation of subjects in the community. PROtEuS: Case-control, population-based. Source of cases: All new histologically-confirmed cases, aged less or equal to 75 years, diagnosed between 2005 and 2009, actively ascertained across Montreal French hospitals. Source of controls: Randomly selected from the Provincial electoral list of French-speaking men between 2005 and 2009, from the same area of residence as cases and frequency-matched on age. QLD: Case-control. Source of cases: A longitudinal cohort study (Prostate Cancer Supportive Care and Patient Outcomes Project: ProsCan) conducted in Queensland, through which men newly diagnosed with prostate cancer from 26 private practices and 10 public hospitals were directly referred to ProsCan at the time of diagnosis by their treating clinician (age range 43-88 years). All cases had histopathologically confirmed prostate cancer, following presentation with an abnormal serum PSA and/or lower urinary tract symptoms. Source of controls: Controls comprised healthy male blood donors with no personal history of prostate cancer, recruited through (i) the Australian Red Cross Blood Services in Brisbane (age range 19-76 years) and (ii) the Australian Electoral Commission (AEC) (age and post-code/ area matched to ProsCan, age range 54-90 years). RAPPER: Multi-centre, hospital based blood sample collection study in patients enrolled in clinical trials with prospective collection of radiotherapy toxicity data. Source of cases: Prostate cancer patients enrolled in radiotherapy trials: CHHiP, RT01, Dose Escalation, RADICALS, Pelvic IMRT, PIVOTAL. Source of controls: N/A SABOR: Prostate Cancer Screening Cohort. Source of cases: Men >45 yrs of age participating in annual PSA screening. Source of controls: Males participating in annual PSA prostate cancer risk evaluations (funded by NCI biomarkers discovery and validation grant), recruited through University of Texas Health Science Center at San Antonio and affiliated sites or through study advertisements, enrolment open to the community SCCS: Case-control in cohort, Southeastern USA. Prospective, Observational, Population-based. Source of cases: SCCS entry population. Source of controls: SCCS entry population SCPCS: Population-based, Retrospective, Observational. Source of cases: South Carolina Central Cancer Registry. Source of controls: Health Care Financing Administration beneficiary file SEARCH: Case-control - East Anglia, UK. Source of cases: Men < 70 years of age registered with prostate cancer at the population-based cancer registry, Eastern Cancer Registration and Information Centre, East Anglia, UK. Source of controls: Men attending general practice in East Anglia with no known prostate cancer diagnosis, frequency matched to cases by age and geographic region SNP_Prostate_Ghent: Hospital-based, Retrospective, Observational. Source of cases: Men treated with IMRT as primary or postoperative treatment for prostate cancer at the Ghent University Hospital between 2000 and 2010. Source of controls: Employees of the University hospital and members of social activity clubs, without a history of any cancer. SPAG: Hospital-based, Retrospective, Observational. Source of cases: Guernsey. Source of controls: Guernsey STHM2: Population-based, Retrospective, Observational. Source of cases: Cases were selected among men referred for PSA testing in laboratories in Stockholm County, Sweden, between 2010 and 2012. Source of controls: Controls were selected among men referred for PSA testing in laboratories in Stockholm County, Sweden, between 2010 and 2012. PCPT: Case-control from a randomized clinical trial. Source of cases: Randomized clinical trial. Source of controls: Randomized clinical trial SELECT: Case-cohort from a randomized clinical trial. Source of cases: Randomized clinical trial. Source of controls: Randomized clinical trial TAMPERE: Case-control - Finland, Retrospective, Observational, Population-based. Source of cases: Identified through linkage to the Finnish Cancer Registry and patient records; and the Finnish arm of the ERSPC study. Source of controls: Cohort participants without a diagnosis of cancer UGANDA: Uganda Prostate Cancer Study: Uganda is a case-control study of prostate cancer in Kampala Uganda that was initiated in 2011. Men with prostate cancer were enrolled from the Urology unit at Mulago Hospital and men without prostate cancer (i.e. controls) were enrolled from other clinics (i.e. surgery) at the hospital. UKGPCS: ICR, UK. Source of cases: Cases identified through clinics at the Royal Marsden hospital and nationwide NCRN hospitals. Source of controls: Ken Muir's control- 2000 ULM: Case-control - Germany. Source of cases: familial cases (n=162): identified through questionnaires for family history by collaborating urologists all over Germany; sporadic cases (n=308): prostatectomy series performed in the Clinic of Urology Ulm between 2012 and 2014. Source of controls: age-matched controls (n=188): age-matched men without prostate cancer and negative family history collected in hospitals of Ulm WUGS/WUPCS: Cases Series, USA. Source of cases: Identified through clinics at Washington University in St. Louis. Source of controls: Men diagnosed and managed with prostate cancer in University based clinic. Acknowledgement Statements: Aarhus: This study was supported by the Danish Strategic Research Council (now Innovation Fund Denmark) and the Danish Cancer Society. The Danish Cancer Biobank (DCB) is acknowledged for biological material. AHS: This work was supported by the Intramural Research Program of the NIH, National Cancer Institute, Division of Cancer Epidemiology and Genetics (Z01CP010119). ATBC: This research was supported in part by the Intramural Research Program of the NIH and the National Cancer Institute. Additionally, this research was supported by U.S. Public Health Service contracts N01-CN-45165, N01-RC-45035, N01-RC-37004, HHSN261201000006C, and HHSN261201500005C from the National Cancer Institute, Department of Health and Human Services. BioVu: The dataset(s) used for the analyses described were obtained from Vanderbilt University Medical Center's BioVU which is supported by institutional funding and by the National Center for Research Resources, Grant UL1 RR024975-01 (which is now at the National Center for Advancing Translational Sciences, Grant 2 UL1 TR000445-06). Canary PASS: PASS was supported by Canary Foundation and the National Cancer Institute's Early Detection Research Network (U01 CA086402) CCI: This work was awarded by Prostate Cancer Canada and is proudly funded by the Movember Foundation - Grant # D2013-36.The CCI group would like to thank David Murray, Razmik Mirzayans, and April Scott for their contribution to this work. CerePP French Prostate Cancer Case-Control Study (ProGene): None reported COH: SLN is partially supported by the Morris and Horowitz Families Endowed Professorship COSM: The Swedish Research Council, the Swedish Cancer Foundation CPCS1 & CPCS2: Department of Clinical Biochemistry, Herlev and Gentofte Hospital, Copenhagen University Hospital, Herlev Ringvej 75, DK-2730 Herlev, DenmarkCPCS1 would like to thank the participants and staff of the Copenhagen General Population Study for their important contributions. CPDR: Uniformed Services University for the Health Sciences HU0001-10-2-0002 (PI: David G. McLeod, MD) CPS-II: The American Cancer Society funds the creation, maintenance, and updating of the Cancer Prevention Study II cohort. CPS-II thanks the participants and Study Management Group for their invaluable contributions to this research. We would also like to acknowledge the contribution to this study from central cancer registries supported through the Centers for Disease Control and Prevention National Program of Cancer Registries, and cancer registries supported by the National Cancer Institute Surveillance Epidemiology and End Results program. EPIC: The coordination of EPIC is financially supported by the European Commission (DG-SANCO) and the International Agency for Research on Cancer. The national cohorts are supported by the Danish Cancer Society (Denmark); the Deutsche Krebshilfe, Deutsches Krebsforschungszentrum and Federal Ministry of Education and Research (Germany); the Hellenic Health Foundation, Greek Ministry of Health; Greek Ministry of Education (Greece); the Italian Association for Research on Cancer (AIRC) and National Research Council (Italy); the Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF); the Statistics Netherlands (The Netherlands); the Health Research Fund (FIS), Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, Spanish Ministry of Health ISCIII RETIC (RD06/0020), Red de Centros RCESP, C03/09 (Spain); the Swedish Cancer Society, Swedish Scientific Council and Regional Government of Skåne and Västerbotten, Fundacion Federico SA (Sweden); the Cancer Research UK, Medical Research Council (United Kingdom). EPICAP: The EPICAP study was supported by grants from Ligue Nationale Contre le Cancer, Ligue départementale du Val de Marne; Fondation de France; Agence Nationale de sécurité sanitaire de l'alimentation, de l'environnement et du travail (ANSES). The EPICAP study group would like to thank all urologists, Antoinette Anger and Hasina Randrianasolo (study monitors), Anne-Laure Astolfi, Coline Bernard, Oriane Noyer, Marie-Hélène De Campo, Sandrine Margaroline, Louise N'Diaye, and Sabine Perrier-Bonnet (Clinical Research nurses). ERSPC: This study was supported by the DutchCancerSociety (KWF94-869,98-1657,2002-277,2006-3518, 2010-4800), The Netherlands Organisation for Health Research and Development (ZonMW-002822820, 22000106, 50-50110-98-311, 62300035), The Dutch Cancer Research Foundation (SWOP), and an unconditional grant from Beckman-Coulter-HybritechInc. ESTHER: The ESTHER study was supported by a grant from the Baden Württemberg Ministry of Science, Research and Arts. The ESTHER group would like to thank Hartwig Ziegler, Sonja Wolf, Volker Hermann, Heiko Müller, Karina Dieffenbach, Katja Butterbach for valuable contributions to the study. FHCRC: The FHCRC studies were supported by grants R01-CA056678, R01-CA082664, and R01-CA092579 from the US National Cancer Institute, National Institutes of Health, with additional support from the Fred Hutchinson Cancer Research Center. FHCRC would like to thank all the men who participated in these studies. Gene-PARE: The Gene-PARE study was supported by grants 1R01CA134444 from the U.S. National Institutes of Health, PC074201 and W81XWH-15-1-0680 from the Prostate Cancer Research Program of the Department of Defense and RSGT-05-200-01-CCE from the American Cancer Society. Hamburg-Zagreb: None reported HPFS: The Health Professionals Follow-up Study was supported by grants UM1CA167552, CA133891, CA141298, and P01CA055075. HPFS are grateful to the participants and staff of the Physicians' Health Study and Health Professionals Follow-Up Study for their valuable contributions, as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, and WY. IMPACT: The IMPACT study was funded by The Ronald and Rita McAulay Foundation, CR-UK Project grant (C5047/A1232), Cancer Australia, AICR Netherlands A10-0227, Cancer Australia and Cancer Council Tasmania, NIHR, EU Framework 6, Cancer Councils of Victoria and South Australia, and Philanthropic donation to Northshore University Health System. We acknowledge support from the National Institute for Health Research (NIHR) to the Biomedical Research Centre at The Institute of Cancer Research and Royal Marsden Foundation NHS Trust. IMPACT acknowledges the IMPACT study steering committee, collaborating centres, and participants. IPO-Porto: The IPO-Porto study was funded by Fundaçäo para a Ciência e a Tecnologia (FCT; UID/DTP/00776/2013 and PTDC/DTP-PIC/1308/2014) and by IPO-Porto Research Center (CI-IPOP-16-2012 and CI-IPOP-24-2015). MC and MPS are research fellows from Liga Portuguesa Contra o Cancro, Núcleo Regional do Norte. SM is a research fellow from FCT (SFRH/BD/71397/2010). IPO-Porto would like to express our gratitude to all patients and families who have participated in this study. Karuprostate: The Karuprostate study was supported by the the Frech National Health Directorate and by the Association pour la Recherche sur les Tumeurs de la ProstateKarusprostate thanks Séverine Ferdinand. KULEUVEN: F.C. and S.J. are holders of grants from FWO Vlaanderen (G.0684.12N and G.0830.13N), the Belgian federal government (National Cancer Plan KPC_29_023), and a Concerted Research Action of the KU Leuven (GOA/15/017). TVDB is holder of a doctoral fellowship of the FWO. LAAPC: This study was funded by grant R01CA84979 (to S.A. Ingles) from the National Cancer Institute, National Institutes of Health. Malaysia: The study was funded by the University Malaya High Impact Research Grant (HIR/MOHE/MED/35). Malaysia thanks all associates in the Urology Unit, University of Malaya, Cancer Research Initiatives Foundation (CARIF) and the Malaysian Men's Health Initiative (MMHI). MCCS: MCCS cohort recruitment was funded by VicHealth and Cancer Council Victoria. The MCCS was further supported by Australian NHMRC grants 209057, 251553, and 504711, and by infrastructure provided by Cancer Council Victoria. Cases and their vital status were ascertained through the Victorian Cancer Registry (VCR) and the Australian Institute of Health and Welfare (AIHW), including the National Death Index and the Australian Cancer Database. MCC-Spain: The study was partially funded by the Accion Transversal del Cancer, approved on the Spanish Ministry Council on the 11th October 2007, by the Instituto de Salud Carlos III-FEDER (PI08/1770, PI09/00773-Cantabria, PI11/01889-FEDER, PI12/00265, PI12/01270, and PI12/00715), by the Fundación Marqués de Valdecilla (API 10/09), by the Spanish Association Against Cancer (AECC) Scientific Foundation and by the Catalan Government DURSI grant 2009SGR1489. Samples: Biological samples were stored at the Parc de Salut MAR Biobank (MARBiobanc; Barcelona) which is supported by Instituto de Salud Carlos III FEDER (RD09/0076/00036). Also sample collection was supported by the Xarxa de Bancs de Tumors de Catalunya sponsored by Pla Director d'Oncologia de Catalunya (XBTC). MCC-Spain acknowledges the contribution from Esther Gracia-Lavedan in preparing the data. We thank all the subjects who participated in the study and all MCC-Spain collaborators. MD Anderson: Prostate Cancer Case-Control Studies at MD Anderson (MDA) supported by grants CA68578, ES007784, DAMD W81XWH-07-1-0645, and CA140388. MDACC_AS: None reported MEC: Funding provided by NIH grant U19CA148537 and grant U01CA164973. MIAMI (WFPCS): ACS MOFFITT: The Moffitt group was supported by the US National Cancer Institute (R01CA128813, PI: J.Y. Park). NMHS: Funding for the Nashville Men's Health Study (NMHS) was provided by the National Institutes of Health Grant numbers: RO1CA121060. PCaP only data: The North Carolina - Louisiana Prostate Cancer Project (PCaP) is carried out as a collaborative study supported by the Department of Defense contract DAMD 17-03-2-0052. For HCaP-NC follow-up data: The Health Care Access and Prostate Cancer Treatment in North Carolina (HCaP-NC) study is carried out as a collaborative study supported by the American Cancer Society award RSGT-08-008-01-CPHPS. For studies using both PCaP and HCaP-NC follow-up data please use: The North Carolina - Louisiana Prostate Cancer Project (PCaP) and the Health Care Access and Prostate Cancer Treatment in North Carolina (HCaP-NC) study are carried out as collaborative studies supported by the Department of Defense contract DAMD 17-03-2-0052 and the American Cancer Society award RSGT-08-008-01-CPHPS, respectively. For any PCaP data, please include: The authors thank the staff, advisory committees and research subjects participating in the PCaP study for their important contributions. For studies using PCaP DNA/genotyping data, please include: We would like to acknowledge the UNC BioSpecimen Facility and LSUHSC Pathology Lab for our DNA extractions, blood processing, storage and sample disbursement (https://genome.unc.edu/bsp). For studies using PCaP tissue, please include: We would like to acknowledge the RPCI Department of Urology Tissue Microarray and Immunoanalysis Core for our tissue processing, storage and sample disbursement. For studies using HCaP-NC follow-up data, please use: The Health Care Access and Prostate Cancer Treatment in North Carolina (HCaP-NC) study is carried out as a collaborative study supported by the American Cancer Society award RSGT-08-008-01-CPHPS. The authors thank the staff, advisory committees and research subjects participating in the HCaP-NC study for their important contributions. For studies that use both PCaP and HCaP-NC, please use: The authors thank the staff, advisory committees and research subjects participating in the PCaP and HCaP-NC studies for their important contributions. PCMUS: The PCMUS study was supported by the Bulgarian National Science Fund, Ministry of Education and Science (contract DOO-119/2009; DUNK01/2-2009; DFNI-B01/28/2012) with additional support from the Science Fund of Medical University - Sofia (contract 51/2009; 8I/2009; 28/2010). PHS: The Physicians' Health Study was supported by grants CA34944, CA40360, CA097193, HL26490, and HL34595. PHS members are grateful to the participants and staff of the Physicians' Health Study and Health Professionals Follow-Up Study for their valuable contributions, as well as the following state cancer registries for their help: AL, AZ, AR, CA, CO, CT, DE, FL, GA, ID, IL, IN, IA, KY, LA, ME, MD, MA, MI, NE, NH, NJ, NY, NC, ND, OH, OK, OR, PA, RI, SC, TN, TX, VA, WA, and WY. PLCO: This PLCO study was supported by the Intramural Research Program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIHPLCO thanks Drs. Christine Berg and Philip Prorok, Division of Cancer Prevention at the National Cancer Institute, the screening center investigators and staff of the PLCO Cancer Screening Trial for their contributions to the PLCO Cancer Screening Trial. We thank Mr. Thomas Riley, Mr. Craig Williams, Mr. Matthew Moore, and Ms. Shannon Merkle at Information Management Services, Inc., for their management of the data and Ms. Barbara O'Brien and staff at Westat, Inc. for their contributions to the PLCO Cancer Screening Trial. We also thank the PLCO study participants for their contributions to making this study possible. Poland: None reported PROCAP: PROCAP was supported by the Swedish Cancer Foundation (08-708, 09-0677). PROCAP thanks and acknowledges all of the participants in the PROCAP study. We thank Carin Cavalli-Björkman and Ami Rönnberg Karlsson for their dedicated work in the collection of data. Michael Broms is acknowledged for his skilful work with the databases. KI Biobank is acknowledged for handling the samples and for DNA extraction. We acknowledge The NPCR steering group: Pär Stattin (chair), Anders Widmark, Stefan Karlsson, Magnus Törnblom, Jan Adolfsson, Anna Bill-Axelson, Ove Andrén, David Robinson, Bill Pettersson, Jonas Hugosson, Jan-Erik Damber, Ola Bratt, Göran Ahlgren, Lars Egevad, and Roy Ehrnström. PROGReSS: The PROGReSS study is founded by grants from the Spanish Ministry of Health (INT15/00070; INT16/00154; FIS PI10/00164, FIS PI13/02030; FIS PI16/00046); the Spanish Ministry of Economy and Competitiveness (PTA2014-10228-I), and Fondo Europeo de Desarrollo Regional (FEDER 2007-2013). ProMPT: Founded by CRUK, NIHR, MRC, Cambride Biomedical Research Centre ProtecT: Founded by NIHR. ProtecT and ProMPT would like to acknowledge the support of The University of Cambridge, Cancer Research UK. Cancer Research UK grants (C8197/A10123) and (C8197/A10865) supported the genotyping team. We would also like to acknowledge the support of the National Institute for Health Research which funds the Cambridge Bio-medical Research Centre, Cambridge, UK. We would also like to acknowledge the support of the National Cancer Research Prostate Cancer: Mechanisms of Progression and Treatment (PROMPT) collaborative (grant code G0500966/75466) which has funded tissue and urine collections in Cambridge. We are grateful to staff at the Welcome Trust Clinical Research Facility, Addenbrooke's Clinical Research Centre, Cambridge, UK for their help in conducting the ProtecT study. We also acknowledge the support of the NIHR Cambridge Biomedical Research Centre, the DOH HTA (ProtecT grant), and the NCRI/MRC (ProMPT grant) for help with the bio-repository. The UK Department of Health funded the ProtecT study through the NIHR Health Technology Assessment Programme (projects 96/20/06, 96/20/99). The ProtecT trial and its linked ProMPT and CAP (Comparison Arm for ProtecT) studies are supported by Department of Health, England; Cancer Research UK grant number C522/A8649, Medical Research Council of England grant number G0500966, ID 75466, and The NCRI, UK. The epidemiological data for ProtecT were generated though funding from the Southwest National Health Service Research and Development. DNA extraction in ProtecT was supported by USA Dept of Defense award W81XWH-04-1-0280, Yorkshire Cancer Research and Cancer Research UK. The authors would like to acknowledge the contribution of all members of the ProtecT study research group. The views and opinions expressed therein are those of the authors and do not necessarily reflect those of the Department of Health of England. The bio-repository from ProtecT is supported by the NCRI (ProMPT) Prostate Cancer Collaborative and the Cambridge BMRC grant from NIHR. We thank the National Institute for Health Research, Hutchison Whampoa Limited, the Human Research Tissue Bank (Addenbrooke's Hospital), and Cancer Research UK. PROtEuS: PROtEuS was supported financially through grants from the Canadian Cancer Society (13149, 19500, 19864, 19865) and the Cancer Research Society, in partnership with the Ministère de l'enseignement supérieur, de la recherche, de la science et de la technologie du Québec, and the Fonds de la recherche du Québec - Santé.PROtEuS would like to thank its collaborators and research personnel, and the urologists involved in subjects recruitment. We also wish to acknowledge the special contribution made by Ann Hsing and Anand Chokkalingam to the conception of the genetic component of PROtEuS. QLD: The QLD research is supported by The National Health and Medical Research Council (NHMRC) Australia Project Grants (390130, 1009458) and NHMRC Career Development Fellowship and Cancer Australia PdCCRS funding to J Batra. The QLD team would like to acknowledge and sincerely thank the urologists, pathologists, data managers and patient participants who have generously and altruistically supported the QLD cohort. RAPPER: RAPPER is funded by Cancer Research UK (C1094/A11728; C1094/A18504) and Experimental Cancer Medicine Centre funding (C1467/A7286). The RAPPER group thank Rebecca Elliott for project management. SABOR: The SABOR research is supported by NIH/NCI Early Detection Research Network, grant U01 CA0866402-12. Also supported by the Cancer Center Support Grant to the Cancer Therapy and Research Center from the National Cancer Institute (US) P30 CA054174. SCCS: SCCS is funded by NIH grant R01 CA092447, and SCCS sample preparation was conducted at the Epidemiology Biospecimen Core Lab that is supported in part by the Vanderbilt-Ingram Cancer Center (P30 CA68485). Data on SCCS cancer cases used in this publication were provided by the Alabama Statewide Cancer Registry; Kentucky Cancer Registry, Lexington, KY; Tennessee Department of Health, Office of Cancer Surveillance; Florida Cancer Data System; North Carolina Central Cancer Registry, North Carolina Division of Public Health; Georgia Comprehensive Cancer Registry; Louisiana Tumor Registry; Mississippi Cancer Registry; South Carolina Central Cancer Registry; Virginia Department of Health, Virginia Cancer Registry; Arkansas Department of Health, Cancer Registry, 4815 W. Markham, Little Rock, AR 72205. The Arkansas Central Cancer Registry is fully funded by a grant from National Program of Cancer Registries, Centers for Disease Control and Prevention (CDC). Data on SCCS cancer cases from Mississippi were collected by the Mississippi Cancer Registry which participates in the National Program of Cancer Registries (NPCR) of the Centers for Disease Control and Prevention (CDC). The contents of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the CDC or the Mississippi Cancer Registry. SCPCS: SCPCS is funded by CDC grant S1135-19/19, and SCPCS sample preparation was conducted at the Epidemiology Biospecimen Core Lab that is supported in part by the Vanderbilt-Ingram Cancer Center (P30 CA68485). SEARCH: SEARCH is funded by a program grant from Cancer Research UK (C490/A10124) and supported by the UK National Institute for Health Research Biomedical Research Centre at the University of Cambridge. SNP_Prostate_Ghent: The study was supported by the National Cancer Plan, financed by the Federal Office of Health and Social Affairs, Belgium. SPAG: Wessex Medical ResearchHope for Guernsey, MUG, HSSD, MSG, Roger Allsopp STHM2: STHM2 was supported by grants from The Strategic Research Programme on Cancer (StratCan), Karolinska Institutet; the Linné Centre for Breast and Prostate Cancer (CRISP, number 70867901), Karolinska Institutet; The Swedish Research Council (number K2010-70X-20430-04-3) and The Swedish Cancer Society (numbers 11-0287 and 11-0624); Stiftelsen Johanna Hagstrand och Sigfrid Linnérs minne; Swedish Council for Working Life and Social Research (FAS), number 2012-0073STHM2 acknowledges the Karolinska University Laboratory, Aleris Medilab, Unilabs and the Regional Prostate Cancer Registry for performing analyses and help to retrieve data. Carin Cavalli-Björkman and Britt-Marie Hune for their enthusiastic work as research nurses. Astrid Björklund for skilful data management. We wish to thank the BBMRI.se biobank facility at Karolinska Institutet for biobank services. PCPT & SELECT are funded by Public Health Service grants U10CA37429 and 5UM1CA182883 from the National Cancer Institute. SWOG and SELECT thank the site investigators and staff and, most importantly, the participants who donated their time to this trial. TAMPERE: The Tampere (Finland) study was supported by the Academy of Finland (251074), The Finnish Cancer Organisations, Sigrid Juselius Foundation, and the Competitive Research Funding of the Tampere University Hospital (X51003). The PSA screening samples were collected by the Finnish part of ERSPC (European Study of Screening for Prostate Cancer). TAMPERE would like to thank Riina Liikanen, Liisa Maeaettaenen and Kirsi Talala for their work on samples and databases. UGANDA: None reported UKGPCS: UKGPCS would also like to thank the following for funding support: The Institute of Cancer Research and The Everyman Campaign, The Prostate Cancer Research Foundation, Prostate Research Campaign UK (now Prostate Action), The Orchid Cancer Appeal, The National Cancer Research Network UK, The National Cancer Research Institute (NCRI) UK. We are grateful for support of NIHR funding to the NIHR Biomedical Research Centre at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust. UKGPCS should also like to acknowledge the NCRN nurses, data managers, and consultants for their work in the UKGPCS study. UKGPCS would like to thank all urologists and other persons involved in the planning, coordination, and data collection of the study. ULM: The Ulm group received funds from the German Cancer Aid (Deutsche Krebshilfe). WUGS/WUPCS: WUGS would like to thank the following for funding support: The Anthony DeNovi Fund, the Donald C. McGraw Foundation, and the St. Louis Men's Group Against Cancer.