EGA QuickView Secure and Remote Access to EGA Files What is EGA QuickView? EGA QuickView is a FUSE file system that allows secure and remote access to EGA files. It is a hybrid of sshfs and crypt4ghfs, which means it uses ssh to connect to the EGA distribution servers, downloads data in Crypt4GH format, and transparently decrypts those files. It is compatible with Linux and macOS 12. Why use EGA QuickView? EGA QuickView is a highly useful and convenient tool for anyone who needs to access EGA files remotely and securely. The software offers a quick and convenient way to browse through EGA files without the need for complete downloads, thus saving bandwidth and time. Whether you are a medical professional requiring access to sensitive patient data or a research scientist working on a project that involves EGA files, EGA QuickView provides a highly useful solution. The software boasts advanced security features, ensuring that your data is safe and protected at all times. Get started with EGA QuickView To get started with EGA QuickView, visit the EGA QuickView GitHub repository for more information and installation instructions.
The overall purpose of this study is to investigate the host genetic factors in response to influenza virus infection, with the focus on influenza vaccination in the first substudy "Adult Influenza Vaccine Genetics" and with the focus on influenza natural infection and other acute respiratory infections (ARIs) in the second substudy "Acute Viral Respiratory Infection Genetics". In the first substudy, healthy adults were enrolled in 2008 (male cohort) and 2010 (female cohort) and immunized with seasonal influenza vaccine. In the second substudy, healthy adults were invited to enroll to be followed for acute respiratory illness through two consecutive influenza seasons 2009-2010 and 2010-2011. Peripheral blood genomic DNA samples were collected from all the subjects, and time-series RNA and serum samples were obtained pre- and post- immunization/infection. Genotyping was carried out on peripheral blood genomic DNA samples using Illumina HumanOmniExpress-12 v1 arrays. Peripheral blood RNA samples obtained at each visit were analyzed using Illumina Human HT-12 (for all the samples) and HiSeq 2000 (for 130 samples in the "Acute Viral Respiratory Infection Genetics" study). Serum specimens were tested using hemagglutination-inhibition (HAI) antibody assay for Influenza H1N1, H3N2, and Influenza B strains. A detailed description of each substudy is provided under their own pages below and via the grouping tool in the right-hand box: phs000635 Adult Influenza Vaccine Genetics phs001031 Acute Viral Respiratory Infection Genetics
The AIDS Linked to the Intravenous Experience (ALIVE) Study is a long-standing community based research effort that includes past and current injection drug users (IDUs). ALIVE has been ongoing since 1988 and is one of the longest-running community-based cohorts of IDUs in existence. The primary objectives when the study started were to characterize the incidence and natural history of HIV among injection drug users (IDUs). At each study visit, participants undergo a series of questionnaires that elicit information about substance use (including drugs used and route of administration) and a blood draw. A subsample of 1200 subjects was selected for genotyping for the NIDA Smokescreen effort who report heroin injection and have existing peripheral blood samples (isolated buffy coat). In addition to injected heroin, ALIVE subjects are assessed on lifetime and previous six months use of marijuana (76.3%), crack (55.5%), snorted cocaine (48.3%), injected cocaine (89.9%), injected speedball (87.0%), and smoking heroin (9.0%). The sample is 33.4% female, ~85% African-American and assessed through a mean maximum age of 49.9 years old. Importantly, subjects are not assessed to meet DSM criteria, but all subjects report having injected drugs. We provide demographic data (age at assessment, sex, self-reported race) and drug use data (lifetime "use" ever endorsed) for each drug class/route described above for each subject
Multi-center, prospective observational cohort study of individuals with congenital heart defects (CHD). Phenotypic data and source DNA derived from 10,000 probands, parents, and families of interest are being collected to investigate relationships between genetic factors and phenotypic and clinical outcomes in patients with CHD. Phenotype data will be stored at dbGaP, while molecular and sequence data will be stored at BioData Catalyst. The PCGC Cohort is utilized in the following dbGaP substudies. Please click on the following substudies below or in the "Substudies" section of this top-level study page phs001194 PCGC Cohort. phs000571 PCGC: whole exome sequences, whole genome sequences, targeted sequences, MIP sequences, and SNP array data phs001843 PCGC-CMG Collaboration: whole genome sequences The Gabriella Miller Kids First Pediatric Research Program (Kids First) subset of the PCGC project (phs001194) is now accessible through a separate dbGaP study accession: phs001138. To access this dataset, please submit a Data Access Request (DAR) for phs001138. Approval of this DAR will be expedited for approved users of phs001194. To learn about other Kids First datasets visit https://kidsfirstdrc.org/.NHLBI's TOPMed program has provided additional Whole Genome Sequencing for PCGC participants - that data is accessible through a separate dbGaP sudy accession: phs001735. Access to this data set should be requested through a Data Access Request (DAR) for phs001735.
As next-generation sequencing (NGS) continues to increase in speed and throughput, routine clinical and industrial application draws steadily closer. These “production” uses of NGS will require enhanced quality-monitoring and quality-control to optimize output and reduce costs. We therefore developed a framework called SeqControl for predicting sequencing quality and coverage using a set of 15 metrics describing overall coverage, coverage distribution, base-wise coverage and base-wise quality. Using whole-genome sequences of 27 prostate cancers and 26 normal references we derive multivariate models that predict sequencing quality and depth. SeqControl robustly predicts how much sequencing is required to reach a given coverage depth (AUC = 0.993), accurately classifies clinically relevant formalin-fixed paraffin-embedded samples and makes predictions from as little as 1/8 of a lane of sequencing data (AUC = 0.967). These techniques can be immediately incorporated into existing NGS pipelines to monitor data quality in real-time. SeqControl represents a first step towards statistical process-control for NGS.
We evaluated the feasibility and safety of blood testing for cancer in an interventional study of 10,006 women. Positive tests for DNA or protein were independently confirmed and diagnostic PET-CT imaging was used to localize disease. During the study, 26 cancers were first detected by blood testing, 24 more by standard-of-care screening, and 46 by neither approach. Surgicalexcision of the primary tumors was performed on 12 cancers first detected by blood testing.1.0% of participants underwent PET-CT imaging based on false positive blood tests, and 0.22% underwent a futile invasive diagnostic procedure. These data demonstrate that multi-cancer blood testing, when followed by PET-CT localization, can be safely incorporated into routineclinical care, in some cases leading to surgery with intent to cure. Further studies will be required to determine the clinical validity and utility of multi-cancer blood testing.
Uploading files Users who hold an ega-box-XXX account can upload files using either INBOX or FTP. Users who have a Submitter role associated with their email will only be able to upload files using INBOX. Before uploading your files, please make sure that any files that will be uploaded to EGA do not use special characters in their naming convention, such as # ? ( ) [ ] / \ = + < > : ; " ' , * ^ | &. This can cause issues with the archiving process, leading to problems for end users. The EGA is a shared, public service with limited storage. To manage the available resources, we enforce a limit of 10TB per submission account at any one time. If you exceed this limit, a “permission denied” message will be displayed. This will prevent you from uploading more files, but connecting to your inbox.For submissions larger than 10TB, please perform uploads in 10TB batches: register all the metadata and then finalise the submission. Upload the next batch of files and repeat the same metadata registration and finalisation process until you have completed the file upload. Further information can be found in the SP documentation. INBOX FTP The INBOX is only compatible with files encrypted using the Crypt4gh tool Before uploading If you are not a registered EGA user, you will first need an EGA user account. Please note that it may take a few days for your account to be activated, as it needs to be vouched for by the EGA Helpdesk. Once your account is validated, you will be able to request a submitter role. [Optional] Meanwhile, you can create and add your public key to your EGA account profile. This option is not available for old submission accounts (e.g., ega-box-NNN). As soon as you have been granted a submitter role, you will be able to connect with your username and password to the EGA inbox using the SFTP protocol. If you have also registered a public key in your profile, you can also connect using this key. To upload files to your account, you can use the graphical user interface (GUI) or the command line. Graphical User Interface (GUI)We recommend using FileZilla, a free, open-source FTP client. However, you can use any other GUI that allows connecting over the SFTP protocol. For FileZilla as your GUI, follow these steps to upload files: Create a new connection in Site Manager (File > Site Manager) and select the following options (Figure 1): Protocol: SFTP - SSH File Transfer ProtocolHost: __EGA_INBOX_DOMAIN__Logon Type: Key fileUser: your EGA usernameKey file: Path/to/your/private_keyFigure 1: Process of establishing a new connection to __EGA_INBOX_DOMAIN__ using a key file as the logon method in FileZilla. The figure showcases the FileZilla version 3.52.2 operating on IOS v11.2.3. By following the depicted steps, users can create a secure and efficient connection to the inbox, ensuring seamless data transfers.Click Connect, and you will log in remotely to your home directory. You can think of this folder as a storage "in the EGA cloud" in which you will add your files for the EGA. The uploading area has three folders:To-encrypt: Files uploaded in this folder will be encrypted automatically on the fly.Encrypted: Files uploaded in this folder must already be encrypted with Crypt4gh. Upload your files here if your connection is unstable or you have problems completing the upload into-encrypt.Etc: This folder contains two files that allow the server to show you your username and group instead of some internal numbers. Please do not upload files here; otherwise, you will obtain a permission denied error. Find the files you want to upload by browsing your local storage (left side of your screen in FileZilla). Select all the files you want to upload, then right-click on them and select Upload (Figure 2). Figure 2: Step-by-step process of manually uploading files to __EGA_INBOX_DOMAIN__ using FileZilla, with FileZilla version 3.52.2 operating on IOS v11.2.3. The figure demonstrates how users can transfer data from their local storage to the "EGA cloud" by following the depicted steps Please note that regardless of which folder you upload your files in, both folders (to-encrypt, encrypted) will point to the same path (/) (Figure 3). Therefore, you will see your files in both folders. Figure 3: Both folders, to-encrypt and encrypted, point to the same path (/)" If your connection is unstable, please encrypt your files first using Crypt4gh. Then upload them to the ‘encrypted’ folder. The example above shows how to connect to __EGA_INBOX_DOMAIN__ using the private key. However, if you prefer to log in using your credentials, you can do so. Please go to the Frequently Asked Questions (FAQs) for more information. SFTP command line To upload files securely to your private area of the EGA, you can use SFTP(Secure File Transfer Protocol) with your favorite FTP client. Here's what you need to know to get started: Connect to the target host __EGA_INBOX_DOMAIN__. This is the new hostname for the EGA SFTP service. Log in with your EGA username and key files (or password). Upload files to your private EGA inbox to ensure that only you can access the files. By following these steps, you can securely upload your files to the EGA for safe storage and sharing. Using the SFTP command line client in Linux/Unix Open a terminal and type sftp username@hostnameEnter your EGA passwordTo see a list of available SFTP commands, type helpsftp> put – Upload filesftp> get – Download filesftp> cd path – Change remote directory to ‘path’sftp> pwd – Display remote working directorysftp> lcd path – Change the local directory to ‘path’sftp> lpwd – Display local working directorysftp> ls – Display the contents of the remote working directorysftp> lls – Display the contents of the local working directoryType the "put" command to upload files. For example: put *.bamUse the bye command to close the connection (SFTP session). After uploading- Once you have uploaded files to the inbox, please bear in mind that the checksum needs to be calculated, which can take up to two days. You will only be able to link your files to a run/analysis once the encrypted checksum has been calculated.- When linking your files to the 'Run' or 'Analysis', ensure that the file name matches the file path '/name' in the INBOX folder.- Please delete the files from your SFTP INBOX after all the runs/analyses have been registered and files are ingested (SP > Files > Files ingested). This will clear your inbox space an allow you to upload more files. This will also prevent the files from reappearing in your Submitter Portal inbox. Frequently Asked Questions Specific to the inbox What username should I use to log in to my inbox? The authentication process for logging in to the EGA website, as well as accessing your inbox and outbox, requires the use of your username. If you have forgotten your registered username, please contact our Helpdesk team for assistance. How are checksums calculated in your inbox? If you encrypt the file beforehand and upload it to the "encrypted" folder, the unencrypted checksum will not be calculated until the file is ingested (i.e., until it is used in a run/analysis). If the file is uploaded to the "to-encrypt" folder, then both checksums are calculated.Please bear in mind that after files have been uploaded to the inbox, the checksum must be calculated, which can take from a few hours to two days. Specific to using keys to authenticate Can I access one EGA account from different devices? Yes, you can access your account from different devices by linking several public keys to your EGA account. Each device can generate a unique public-private key pair, and the corresponding public keys can be linked to the same account. This way, you can use different public keys on different devices and still have access to the same account and data. I have several keys and I don't remember which one is which When generating SSH keys, it's a good practice to add a comment using the -C flag. This will allow you to add a descriptive tag to your key, making it easier to identify later on. Here's an example command that generates an SSH key with a comment: ssh-keygen -t ed25519 -C work-pass In this example, we're generating an ed25519 SSH key with the comment work-pass. Once you have multiple keys with different comments, you can use the comments to easily identify each key. To view the comments for your existing SSH keys, you can use the following command: ssh-keygen -l -f /path/to/key This will display the key fingerprint and the associated comment. By checking the comments, you should be able to identify which key is which. What if I can't find my SSH keys for uploading files with a key file, and how can I use new keys? If you can't find your SSH keys, don't worry - you can make new ones. To do this, open your terminal or command prompt and type a command to make a new SSH key. You can pick a name for the key, and choose a password to keep it safe. After making the key, you can add the new key to your account or server where you want to upload files using the key file. This usually involves copying and pasting the key's "public" (e.g. file.pub) part to the right place. If you lose track of the key again, just make a new one and add it again. Keep in mind that SSH keys belong to you and your computer, so if you switch computers or accounts, you'll need to make new keys. I don't want to type the passphrase every time I use the key. What can I do? You can use an ssh-agent to avoid typing the passphrase every time you use the key. An ssh-agent is a program that stores your private keys in memory and provides them to ssh when needed. You can add your key to the ssh-agent using the command ssh-add followed by the path to your key file.Here's an example of the steps to follow: Open a terminal window.Start the ssh-agent by typing the command eval $(ssh-agent).Add your key to the ssh-agent by typing the command ssh-add [key filepath]. For instance, if your key file is located in the home directory with the name mykey, the command will look like this: ssh-add ~/mykey After adding your, key to the ssh-agent, you should be able to use ssh without having to enter your passphrase every time. Can I use my password for authentication (without my private key)? If you prefer to use your username and password for authentication instead of your private key, you can still do so. When using a Graphical User Interface (GUI) such as FileZilla, you can select Ask for password as your Logon Type (Figure 3). This option will prompt you to enter your password when you click Connect, instead of using your private key. Figure 3: This option will prompt you to enter your password when you click "Connect", instead of using your private key. Figure 3: Process of establishing a new connection to __EGA_INBOX_DOMAIN__ using your password as the logon method in FileZilla. The figure showcases the FileZilla version 3.52.2 operating on IOS v11.2.3. By following the depicted steps, users can create a secure and efficient connection to the inbox, ensuring seamless data transfers. It's worth noting that using a password for authentication can be less secure than using an SSH key, as passwords can be more easily compromised through various means. However, if you choose to use your password for authentication, selecting "Ask for password" as your Logon Type is a good way to do so securely via a GUI. Why is it better to use my key and not my password? SSH keys for authentication is generally considered to be more secure and convenient than using passwords. SSH keys are more difficult to crack than passwords, and they can be restricted to specific users and machines, giving you more control over access. Once you set up your SSH keys, you can use them to authenticate quickly and easily, without having to enter a password every time. This makes automation of tasks, such as uploading encrypted files, much simpler. Additionally, SSH keys provide better logging, allowing you to keep track of who is accessing your systems and when. All in all, using SSH keys is a good practice for improving security and convenience in your authentication process.
Buccal samples and paired esophageal epithelium were obtained using the three sizes of swabs and endoscopic biopsy, respectively. Forty samples from 10 subjects were analyzed via duplex sequencing. This dataset contains bam files that were mapped to the GRCh37 reference genome.
Whole Genome Sequencing
Targeted DNA sequencing of high-grade serous ovarian cancer (HGSC) tumour and normal samples from 26 patients. Following target hybrid capture of 63 genes involved in DNA repair and response to treatment with an Agilent SureSelect XT panel, sequencing libraries were generated using the SureSelect XT Low Input Target Enrichment System (Agilent) as per the manufacturer's protocol. Libraries were sequenced on an Illumina NextSeq 500 at the Peter MacCallum Cancer Centre (Melbourne, Australia).