Submission account terms Each EGA submission account has a dedicated submission box. To ensure continued service to all submitters EGA submission boxes should not exceed 8Tb in size, and cannot exceed 12Tb. Those that are near to or exceeding this value will be monitored on a weekly basis by the helpdesk team and the submitter will be contacted and guidance provided accordingly on completing their metadata registration. You will be able to register your metadata associated with the files in your submission account via EGA Submitter Portal, Webin or the EGA REST API. EGA also requests that metadata is submitted within 60 days of upload data to a submission box. When we detect files that are over 60 days old in a submission box we will notify the submitter requesting that metadata be submitted or the files removed from the submission box. It is important to note that data is not archived at the EGA until the metadata describing the files has been associated and the archival process has completed. It is only when associated metadata has been submitted that the archival process into EGA is fully triggered. EGA removes files from the submission box once they have been successfully archived, so no action is required by a submitter to remove successfully archived files. Files over 90 days old will be deleted from the submission box unless EGA Helpdesk has been contacted and confirmed an exemption with the submitter.
The GoDARTS T2D-GENES Exome sequencing study is a subset of a larger Type 2 Diabetes Exome Sequencing project. This effort is a collaboration of six consortia with various funding mechanisms that have joined together to investigate genetic variants for type 2 diabetes (T2D) using the largest T2D case/control sample set compiled to date and includes samples from: T2D-GENES, GoT2D, ESP, SIGMA T2D, LuCAMP, and ProDIGY. This data generated from the Genetics of Diabetes and Audit Research Tayside Study (GoDARTS) cohort was part of the T2D-GENES (Type 2 Diabetes Genetic Exploration by Next-generation sequencing in multi-Ethnic Samples) consortium, which is a NIDDK-funded international research consortium that seeks to identify genetic variants for T2D through multiethnic sequencing studies. Sequencing for the GoDARTS study was performed at the Broad Institute using Illumina Rapid Capture on Illumina HiSeq machines.
Tandem duplication of chromosomal segments is common in ovarian and breast cancer genomes
The purpose of this study is to investigate ancestry admixture among Chileans with and without gallbladder cancer.
This project is correlating the molecular profiling of renal tumours with multiparametric and 13C-MRI including by 13C-MRSI.
File preparation Due to the processes used at the EGA for file archival the use of non-alphanumeric characters in a filename will cause issues in archival. By convention whitespaces in filenames are to be avoided and should be replaced with the underscore character (_). Before encrypting your files please make sure that any files that will be uploaded to EGA do not use special characters such as # ? ( ) [ ] / \ = + < > : ; " ' , * ^ | & Crypt4gh EGACryptor Files encrypted with EGACryptor must be uploaded via FTP EGACryptor The EGACryptor v.2.0.0 is a JAVA-based application which enables submitters to produce EGA compliant encrypted files along with files for the encrypted and unencrypted md5sum for each file to be submitted. The application will generate an output folder that will by default mirror the directory structure containing the original files. This output folder can subsequently be uploaded to the EGA FTP staging area via an FTP or Aspera client. Download EgaCryptor Download EgaCryptor Using the EgaCryptor Using the EgaCryptor Encrypting single file Encrypting multiple files Encrypting all files in folder Points to note Troubleshooting Troubleshooting Download EGACryptor The required jar files can be obtained by downloading EgaCrytptor jar file After the file has been downloaded, extraction of the zipped archive is required. The EGACryptor has been built to work with Java Runtime Environments from version 6 and above and with the OpenJDK Environment. Please refer to the relevant resources for installation guidance. Installing the latest version of the OpenJDK will include the JCE files. If your installation of Java JRE is less than 1.8.0_151 will require the manual installation of the JCE Policy Files. You can verify the version of the Java SE Runtime Environment (JRE) installed by using the command: $ java -version If you need to install the JCE please follow the instructions below: Installing the JCE policy files (due to licensing terms and conditions the required policy files must be downloaded direct from the ORACLE website) : Download the unlimited strength JCE policy files (JRE 6 / JRE 7/ JRE 8) Uncompress and extract the downloaded file. This will create a subdirectory called JCE. This directory contains the following files: README.txt, COPYRIGHT.html, local_policy.jar and US_export_policy.jar Install the two policy JAR files by replacing the existing ones in your java home directory. Install the two policy JAR files by replacing the existing ones in your directory. The standard place for JCE jurisdiction policy JAR files is: /lib/security [Unix] or \lib\security [Win32] Notes: refers to the directory where the Java SE Runtime Environment (JRE) was installed. Additional performance enhancements that have been included in the EGACryptor V2.0.0: The ability to parallelise the processing of datasets through the use of the resources on a system. Multicore systems will allow the user to specify n-1 cores for an n-core system. The use of this feature on clusters may speed up the processing of datasets that have large file numbers but consult your local cluster guide to ensure that there are not monopolising resources that are needed by other system users. The default for this process remains single threaded. 3 levels of system usage can be specified. Full usage within the limits detailed above. A limited mode that will ensure that 50% of the system resources are available for other tasks. Maximum mode is limited to 75% of system resources, this allows encryption to be prioritised but allows for the system to be usable for light alternate tasks. Finally there is a throttling mode that allows you to specify the exact number of computational threads to be used. the EGACryptor is able to ingest a structured directory and will output a directory with the same structure containing the encrypted files along with the md5checksums for the plain and encrypted files. The entire output directory can then be uploaded to the EGA for archival. as with the input path, it is now possible to specify the output path. the options have been updated inline with the upgraded functionality. The tool can only be used via the command line. The EGACryptor is designed to perform a single task, encrypting your data, for upload of these files please refer to our uploading guide Using the EgaCryptor Below are the three ways on how the EGACryptor tool can be used: Encrypting a single file : java -jar ../EGA-Cryptor_2_0_0.jar -i example1.bam Encrypting multiple files : java -jar ../EGA-Cryptor_2_0_0.jar -i "example1.bam,example2.bam" Encrypting all the files within a folder java -jar ../EGA-Cryptor_2_0_0.jar -i path/to/target/directory By default the EGACryptor v2 will create a new output directory containing all encrypted files and the relevant checksums within the target directory. If a specific directory is desired this can be specified by using the -o flag. This can be achieved in a similar manner to the following example: java -jar ../EGA-Cryptor_2_0_0.jar -i /path/to/target/directory -o /path/to/output/directory The tool will output three files per input file: file.gpg ( encrypted file ) file.md5( file md5 sum value file ) file.gpg.md5 ( encrypted file md5 sum value file ) All output files must then be uploaded to your submission account using Aspera or FTP. Further documentation on how to upload files: FTP and Aspera. Points to note Remember to provide the path to EgaCryptor.jar and run the command from within the directory the file/s are located. ECryptor writes files to the source directory of your local file system, as a result you must have write-access permissions for the source directory. Troubleshooting If in doubt about the function of the EGACryptor it is recommended to first consult the built-in documentation. This can be accessessed by using the -h flag as stated in the following table. Built-in Commands Table: list of the command line options built into EGA-Cryptor v2.0.0. Command line Option Action --------------------- ----------- -f Set this option to allow application to create maximum threads to utilise full capacity of cores/processors available on machine -h Use this option to view the bult-in help menu * -i File(s) to encrypt. Provide file/folder path or comma separated file path if multiple files in double quotes -l Set this option to allow application to create maximum threads equals to 50% capacity of cores/processors available on machine -m Set this option to allow application to create maximum threads equals to 75% capacity of cores/processors available on machine -o Path of the output file. This is optional. If not provided then output files will be generated in the same path as that of source file (default: output- files) -t Set this option if user wants to control application to create maximum threads as specified. Application will calculate no. of cores/processors available on machine & will create threads accordingly Encryption Errors UnixFileSystem.createFileExclusively (Native Method) The error is thrown by UNIX ("UnixFileSystem.createFileExclusively(Native Method)"). It appears that the user does not have write-access to the file system where the file to be uploaded is located. EGACryptor always writes MD5 checksums into files before uploading them to the server, and these files are created in the same location where the uploaded file itself resides.Solution: address your directory permission issue and re-run the command. Install JCE Unlimited Strength Jurisdiction Policy files The JCE policy unlimited strength jurisdiction files should be installed according to your current java version If you are still facing difficulty with the EGACryptor v.2 after having consulted the documentation please contact the EGA Helpdesk.
SNP Genotyping for Lassa Fever cases and population controls from Nigeria and Sierra Leone using Illumina H3Africa array version 1
RNAseq data of 30 muscle cell samples (proliferating myoblasts to differentiating myotubes) of immortalized control cells, and myotonic dystrophy type 1 (DM1) patient-derived cells and their edited versions.
WES for Patient 1 to 8 of NIBIT-M4 clinical trial
Study Overview The Environmental Determinants of Diabetes in the Young (TEDDY) Study is a longitudinal study that investigates genetic and genetic-environmental interactions, including gestational events, childhood infections, dietary exposures, and other environmental factors after birth, in relation to the development of islet autoimmunity and type 1 diabetes (T1D). A consortium of six clinical centers assembled to participate in the development and implementation of the study to identify environmental triggers for the development of islet autoimmunity and T1D in genetically susceptible individuals. Beginning in 2004, the TEDDY study screened over 400,000 newborns for high-risk HLA-DR, DQ genotypes from both the general population and families already affected by T1D. The TEDDY study enrolled around 8,676 participants across six clinical centers worldwide (Finland, Germany, Sweden and three in the United States) in the 15-year prospective follow-up. Participants are followed every three months for islet autoantibody (IA) measurements with blood sampling until four years of age and then at least every six months until the age of 15. After the age of four, autoantibody positive participants continue to be followed at three month intervals and autoantibody negative participants are followed at six-month intervals. In addition to the analysis of autoantibodies, additional data and sample collection are performed at each visit. Parents collect monthly stool samples in early childhood. The parents also fill out questionnaires at regular intervals in connection with study visits and record information about diet and health status in the child's TEDDY Book between visits. Continued long-term follow-up of the currently active TEDDY participants will provide important scientific information on early childhood diet, reported and measured infections, vaccinations, and psychosocial stressors that may contribute to the development of type 1 diabetes and islet autoimmunity. Additional information on the TEDDY study is available in the following articles: Rewers et al., 2008, PMID: 19120261 and Hagopian et al., 2006, PMID: 17130573. Details of the TEDDY protocol can be found in Hagopian et al., 2011, PMID: 21564455. TEDDY data currently available in dbGaP include: gene expression, SNPs, exome, microbiome (gut, nasal, and plasma), RNA sequencing, and whole genome sequencing. For more information on TEDDY Study version history please refer to TEDDY Study dbGaP README File. ImmunoChip SNP DNA from whole blood samples on study participants and their family members (mothers, fathers, and siblings) was obtained and used for SNP genotyping. Genotyping was performed by the Center for Public Health Genomics at the University of Virginia using the Illumina ImmunoChip SNP array, which contains around 196,000 SNPs from 186 regions associated with 12 autoimmune diseases (Hadley et al., 2015, PMID: 26010309). Data cleaning and validation included the removal of subjects with a low call rate (< 5% SNPs missing) and differences in reported sex and prior genotyping at the TEDDY HLA laboratory. Additionally, SNPs with a low call rate or Hardy-Weinberg equilibrium P value < 10-6, except for chromosome 6 due to HLA eligibility requirements, were removed from the final dataset (Törn et al., 2015, PMID: 25422107).TEDDY-T1DExome ArrayDNA from whole blood samples on study participants and their family members (mothers, fathers, and siblings) was obtained and used for genotyping. Genotyping was performed by the University of Virginia using the Illumina TEDDY-T1DExome array. The TEDDY-T1DExome array is a custom chip that contains 550,601 markers from the Infinium CoreExome-24 v1.1 BeadChip and an additional 90,214 tagSNPs specifically selected by the TEDDY investigators based on their associations with nutrients, vitamins, type 2 diabetes, autoimmune diseases, body-mass index, or other exposures and phenotypes measured by TEDDY study.The Illumina GenTrain2 algorithm was used for genotype calling. Sample quality control metrics included sample call rate, heterozygosity rate and concordance of gender between the information reported and genotyped. Gene Expression The TEDDY study collected peripheral blood for the extraction of total RNA from enrolled children starting at 3 months of age, and then at 3 month intervals up to 48 months and then biannually. Total RNA was extracted using a high throughput (96-well format) extraction protocol using magnetic (MagMax) beads technology at the TEDDY RNA Laboratory, Jinfiniti Biosciences in Augusta, GA. Purified RNA (200 ng) was further used for cRNA amplification and labeling with biotin using Target Amp cDNA synthesis kit (Epicenter catalog no. TAB1R6924). Labeled cRNA was hybridized to the Illumina HumanHT-12 Expression BeadChips based on the manufacturer's instructions. The HumanHT-12 Expression BeadChip provides coverage for more than 47,000 transcripts and known splice variants across the human transcriptome. Microbiome The TEDDY microbiome study aimed to characterize the longitudinal development of the microbiome, including bacteria, viruses and other microorganisms in the gut, plasma, and nasal cavity of prediabetic and diabetic subjects compared to autoantibody negative non-diabetic subjects. Stool samples used were collected monthly from 3 to 48 months, after which stool samples were collected every 3 months. Nasal swab samples were collected every 3 months starting at 9 months of age until 48 months, after which nasal swabs were collected every 6 months. Plasma samples were collected every 3 months starting at 3 months of age until 48 months, after which plasma samples were collected every 6 months. If the subject was autoantibody positive at 48 months then they remained on the 3 month collection interval for nasal swab and plasma samples. Samples underwent 16s rRNA gene sequencing, DNA and viral RNA metagenomics shotgun sequencing, and sequencing of the internal transcribed spacer (ITS) regions. Additional information on the TEDDY microbiome data is available in the following articles: Vatanen et al., 2018, PMID: 30356183, Stewart et al., 2018, PMID: 30356187, and Vehik et al., 2020, PMID: 31792456. RNA Sequencing The TEDDY study aimed to characterize the transcriptome in subjects with islet autoimmunity and type 1 diabetes compared to matched control subjects. Peripheral blood was collected to extract total RNA from enrolled children starting at 3 months of age, and then at 3 month intervals up to 48 months and then biannually. Total RNA was extracted using a high throughput (96-well format) extraction protocol using magnetic (MagMax) beads technology at the TEDDY RNA Laboratory, Jinfiniti Biosciences in Augusta, GA. Purified RNA was then sent to the Broad Institute for the generation of the TEDDY RNA sequencing (RNA-Seq) data. The RNA samples were prepped using Superscript III reverse transcriptase and Illumina's TruSeq Stranded mRNA Sample Prep Kit. The TruSeq libraries were run on the Illumina HiSeq2500 platform. Whole Genome Sequencing The TEDDY study aimed to conduct deep whole genome sequencing and examine the genomic variations in subjects with islet autoimmunity and type 1 diabetes compared to matched autoantibody negative and non-diabetic children. DNA from whole blood was obtained from TEDDY children for whole genome sequencing. The WGS data were generated on the Illumina HiSeq X Ten system.