Immune escape has been recognised as one of the hallmarks of cancer. Overcoming this immunomodulatory process by tumour cells has become a major therapeutic target. Here we utilize organoid technology to study immune-cancer interactions and assess immunomodulation by colorectal cancer (CRC). Transcriptional profiling and flow cytometry revealed that organoids maintain differential expression of immunomodulatory molecules present in primary tumours. Finally, we established a method to model antigen-specific epithelial cell killing and cancer immunomodulation in vitro using CRC organoids co-cultured with cytotoxic T cells. Our method may serve as a first step to rebuilding the tumor microenvironment in vitro.
Metagenomic data of stool samples from 111 patients with either bipolar disorder or schizophrenia spectrum disorder. Data includes fq.gz files, mostly 1 forward and 1 reverse read per sample. Samples analysed in multiple lanes have 4 (paired) fastq files, which will have the same patient ID (example: 048_1 and 048_2, or 088_L1 and 088_L2. DNA was extracted from using the QIAamp Fast DNA Stool Mini Kit (Qiagen) following the manufacturer’s instructions. Shotgun metagenomic sequencing was carried out using llumina Novaseq 6000.
The CHARM (Cancer Health Assessment Reaching Many) study, part of the Clinical Sequencing Evidence-Generating Research (CS.ER2) effort, aimed to assess the utility of clinical exome sequencing and how it affects care in diverse populations. The study population included adults at risk for hereditary cancer syndromes. The primary objective was the implementation of a hereditary cancer risk assessment program in healthy 18-49 year-olds in primary care settings within a vertically integrated health delivery system (Kaiser Permanente) and a federal qualified health center (Denver Health). The investigators aimed to assess clinical exome sequencing implementation and interpretation, as well as tailored interactions for low health literacy including a contextualized consent process, and a modified approach to results disclosure and genetic counseling. The investigators were also assessing the clinical utility (healthcare utilization and adherence to recommended care) and personal utility of primary and additional results from clinical exome sequencing, and evaluate the ethical and policy implications of considering personal utility of genomic information decisions for health care coverage.Sequencing and analysis for CHARM was conducted at the Northwest Clinical Genomics Laboratory, University of Washington, Seattle (https://nwgc.gs.washington.edu/).
Trans-differentiation from adenocarcinoma to small cell neuroendocrine (SCN) cancer is an adverse consequence of treatment escape in various cancers, including prostate, lung, and bladder cancers (Balanis and Sheu et al., PMID 31287989). Expression of dominant negative p53 (TP53DN), myrAkt1, RB1-shRNA, c-Myc, and Bcl2(PARCB forward transformation) using human naïve prostate basal epithelial cells recapitulated both transcriptional and histological characteristics of small cell neuroendocrine prostate cancer (NEPC) (Park et al., PMID 30287662). To study the temporal transcriptional landscape during this trans-differentiation process, we conducted a time course study using a PARCB model by integrating multi-omics sequencing, including bulk RNA-sequencing and ATAC-sequencing on samples taken at different time point, as well as single cell RNA sequencing on serial xenograft tumors. We found a common SCN pathway that resulted in two distinct end states defined by mutually exclusive expression of ASCL1 and ASCL2. Further investigation using CUT&RUN sequencing identified TFAP4 as a potential epigenetic regulator of both proteins. Our study reveals temporal and transcriptional changes from prostate adenocarcinoma to NEPC trans-differentiation.
A comprehensive gene expression analysis of the process leading up to the onset of Alzheimer???s disease (AD) would be helpful for understanding the mechanism. We performed an RNA sequencing analysis on a cohort of 1227 Japanese blood samples, representing 424 AD patients, 543 individuals with mild cognitive impairment (MCI), and 260 cognitively normal (CN) individuals. A total of 883 and 1169 statistically significant differentially expressed genes (DEGs) were identified between CN and MCI (CN-MCI) and between MCI and AD (MCI-AD), respectively. Pathway analyses using these DEGs, followed by protein???protein interaction network analysis, revealed key roles of ribosomal genes (RPL7, RPL11, and RPL14) and phagosomes (CDC42, PTPRC, PLCG1, and ACTR2) in MCI progression, whereas immune-related genes were involved in AD progression. Given the known effectiveness of delaying MCI progression in preventing AD, the genes related to ribosomal function and phagocytosis might emerge as biomarkers for early diagnosis.
The EUCAIM consortium and the European Commission have announced the release of Cancer Image Europe, a platform to fuel collaborative innovation and data sharing across Europe. This constitutes a major milestone in EUCAIM's development and an exciting step towards achieving the project's vision and goals. The platform aims to accelerate the pace of AI development and other data-intensive cancer research activities, constituting the basis for a fully-fledged infrastructure for sharing, reusing, and exploiting cancer imaging data, especially using AI techniques. A platform paving the way for the future of cancer diagnosis and treatment Cancer Image Europe brings benefits to researchers, clinicians, and AI innovators across Europe as it addresses the fragmentation of existing cancer image repositories. The platform gathers a public catalogue of cancer imaging datasets from the repositories of the EU-funded AI for Health Imaging projects. A distributed Atlas of Cancer Imaging with over 60 million anonymised cancer image data from over 100,000 patients will be established as part of future updates. The first version of Cancer Image Europe also includes a public catalogue of cancer imaging datasets, a federated searching tool to understand the information available in the federated providers, and full integration with Life Science Login. Moreover, it reuses and adds value to key components of EU-funded research projects and infrastructure in the field of cancer. The Public Catalogue When we talk about data re-use, data discovery is key. The EUCAIM public catalogue is the result of a systematic and collaborative process to which the EGA-CRG has actively contributed. This browsing interface gathers the metadata of the available cancer imaging datasets. This descriptive information will help researchers in their data retrieval. Our contribution to the Public Catalogue comes from our experience gained in the deployment of our federated network, the Federated EGA (FEGA), and as partners in projects such as EuCanImage (AI4HI initiative), in which we worked on the definition of common data models, ontologies and data standards. The EUCAIM Project EUCAIM is a cornerstone of the European Commission-initiated European Cancer Imaging Initiative, a flagship of the Europe's Beating Cancer Plan, which aims to foster innovation and deployment of digital technologies in cancer treatment and care, to achieve more precise and faster clinical decision-making, diagnostics, treatments, and predictive medicine for cancer patients. For more information about the first release of the Cancer Image Europe platform and the EUCAIM project, please visit cancerimage.eu.
Metadata Submission Welcome to the EGA Metadata Submission landing page. Here, we provide you with essential resources to facilitate a seamless submission process. Before starting your submission, please make sure you check the following pages: Submission FAQ Submission Quickguide Submission account terms EGA metadata schema If you want to submit sequencing and/or phenotypic data check these pages: Submitter Portal Submitter Portal API If you want to submit array data check these pages: Array documentation Webin Webin API
Gene editing in induced pluripotent stem (iPS) cells has been hailed for enabling new cell therapies for various monogenetic diseases including dystrophic epidermolysis bullosa (DEB). However, manufacturing, efficacy, and safety roadblocks have limited the development of genetically corrected, autologous iPS cell-based therapies. Dystrophic Epidermolysis Bullosa Cell Therapy (DEBCT), is a new generation, Good Manufacturing Practice-compatible (cGMP), reproducible, and scalable platform to produce autologous clinical-grade iPS cell-derived organotypic induced skin composite (iSC) grafts to treat incurable wounds of patients lacking type VII collagen (C7). DEBCT uses a single step, combined, high-efficiency reprogramming, and CRISPR-based genetic correction to generate genome scar-free, COL7A1 corrected clonal iPS cells from primary patient fibroblasts. Validated iPS cells are converted into epidermal, dermal, and melanocyte progenitors with a novel 2D organoid differentiation protocol, followed by CD49f enrichment and expansion to minimize maturation heterogeneity. iSC product characterization by single cell transcriptomics was followed by mouse xenografting for disease correcting activity at 1 month and toxicology analysis at 1-6 months. Culture-acquired mutations, potential CRISPR-off target effects, and cancer-driver variants were evaluated by targeted and whole genome sequencing. iPS cell-derived iSC grafts were reproducibly generated from four recessive DEB patients with different pathogenic mutations. Organotypic iSC grafts onto immune-compromised mice developed into stable stratified skin with functional C7 restoration. Single cell transcriptomic characterization of iSCs revealed prominent holoclone stem cell signatures in keratinocytes and the recently described Gibbin-dependent signature in dermal fibroblasts. The latter correlated with enhanced graftability. Multiple orthogonal sequencing and subsequent computational approaches identified random and non-oncogenic mutations introduced by the manufacturing process. Toxicology revealed no detectable tumors after 3-6 months in DEBCT-treated mice. In conclusion, DEBCT successfully overcomes previous roadblocks and establishes a robust, scalable, and safe cGMP manufacturing platform for the production of a CRISPR-corrected autologous organotypic skin graft to heal DEB patient wounds.
Focal segmental glomerulosclerosis (FSGS) is a frequent cause of end-stage renal disease. The pathogenesis of FSGS has not been precisely defined and there are no consistently effective treatments. Recent studies identifying causal genes in rare, inherited FSGS, including our own study, have associated mutations in at least six genes with familial FSGS, and each discovery has clarified molecular mechanisms of glomerular injury. To build on this productive line of inquiry, we have ascertained and carefully characterized 118 families with familial FSGS. We have screened the remainder of our families for mutations in genes known to cause FSGS and identified the causal mutations in an additional 6 kindreds; the genetic basis of disease in the remaining 111 families is unknown. The objective of this proposal is to use this valuable and unique family resource to systematically identify causal genes for familial FSGS. Limitations of current conventional linkage and positional cloning approaches include their requirement for large, multiplex families. In addition, narrowing candidate areas in traditional linkage analysis can be difficult due to large regions that lack recombination events and hence these regions have required cumbersome and lengthy screening for causative mutations. Powerful new genetic tools can facilitate this screening process and improve variant discovery in smaller families. In particular, efficient whole-exome sequencing, the targeted capture of protein-coding gene sequences, should be particularly useful in our studies since most Mendelian disorders are caused by mutations affecting exomes of the target gene. Thus, by combining genome-wide linkage analysis (GWLS) and whole-exome sequencing, we can maximize impact of our family data and accelerate identification of novel mutations in FSGS. In preliminary studies, we have used this combination to identify a novel variant in the WT1 (Wilms' Tumor-1) gene in one FSGS family, and we have evidence suggesting it is the causal mutation. This success provides proof-of-concept and provides a roadmap of how genes will be identified and evaluated in the proposed studies. Our hypothesis is that causes of inherited FSGS in our cohort of families will be sequence variants in the coding region of genes not previously associated with familial FSGS. We aim to: 1) Use GWLS and whole-exome sequencing to identify genetic variants associated with familial FSGS. 2) Characterize functional consequences of candidate causative mutations and 3) Determine the prevalence in the Duke FSGS dataset of these new causative mutations identified in Aim 2. Any genes found to have causative mutations will be sequenced in the remaining families and take full advantage of our family resource. By combining genome-wide linkage analysis (GWLS), whole-exome sequencing, and characterization of variants' functional consequences, we will significantly improve understanding of normal glomerular biology and of the pathogenesis of FSGS and related glomerular diseases. Moreover, our discoveries are likely to reveal new opportunities to improve therapy for a disease that currently has few effective treatments.
Programmatic submissions (XML based) For further information please check our Submission FAQs, submission quickguide as well as submission terms! Introduction Besides the Submitter Portal tool, EGA supports programmatic sequence and clinical data metadata submissions. If you are not sure what this means, you may want to explore our brief metadata introduction. Programmatic submissions are recommended for array-based submission. Moreove, it may be of help if your submission is recurrent or it is difficult to manage manually due to its sheer size. Otherwise, we highly recommend using the Submitter Portal to perform submissions. In this page we will guide you through the required steps to programmatically submit data to the EGA. Programmatic submissions require your metadata to be structured for an easy and straightforward validation and archival. It basically consists in formatting your metadata as Extensible markup language (XML) files and submitting them to the EGA using the WEBIN Before submitting metadata to the EGA, it is important to ensure that the information in your XML files is compliant with our standards. You can see further details on how these standards are maintained at EGA at our EGA Schemas documentation page. Using WEBIN, you can validate your XML files against EGA's schemas to ensure that your metadata is compliant before submission. WEBIN services WEBIN production service WEBIN test service We advise you to submit your metadata to the test service when submitting to the production service for the first time. The test service is identical to the production service except that all submissions will be discarded in the following 24 hours. This allows you to learn about the submission process without having to worry about data being submitted. Authentication Authentication is required each time a submission is made. The submission service uses HTTPS protocol for metadata encryption and identification to provide a secure submission environment. Data file upload Both Runs and Analyses reference files (e.g. FASTQ need to be uploaded to the EGA before these metadata objects are submitted. In other words, if you submit a Run that references a file that we cannot find associated with your account, the metadata submission will fail. See further details on how to upload your files in our File Upload documentation. Metadata model of the EGA Our metadata model is formed by multiple metadata objects. Check further details in our documentation at our EGA Schema documentation page. Working with EGA XMLs files Now that the basic concepts of the EGA metadata have been described, you can start preparing your programmatic submission through XML. Here you will find the guidance on how to prepare the XML files. Programmatic Submission Tutorial Video Take a look at the Programmatic Submission Tutorial Video, which explains the workflow of a programmatic submission and goes over an example metadata submission. Programmatic Submission Tutorial Video. When building your XML files, we recommend using text editors (e.g.Sublime Text or VisualStudio) that allow you to visualise the structure of the XML with ease. Furthermore, these editors constantly check the consistency of the XML structure. Alternatively, and if the submission consists of a big number of objects (specially analyses), you may find the tool star2xml handy. This tool allows for a direct conversion between metadata in a tabular format (e.g. a spreadsheet) into XMLs. Identifying objects: Aliases and center names Every EGA object must be uniquely identified within the submission account using their alias attribute. The aliases can be used in submissions to make references between EGA objects. Let us dig into EGA's use of aliases and center names: alias: every object should have a name that is unique within your submission account. Once submitted successfully, every alias will be assigned a unique and permanent accession (EGA ID). refname: when an object references another by its alias, the alias of the referenced object goes into the "refname" attribute of the referencing object. For example, if a sample has the alias "sample1", and an experiment uses this sample, then the experiment's "EXPERIMENT/SAMPLE/refname" attribute should be "sample1". center_name: The "center_name" attribute is required within the submission XML and, if not provided when the object is submitted, it will be automatically filled using your default EGA account center_name. This element is the "controlled vocabulary acronym or abbreviation that is provided to the account holder when the account is first generated". If the submitter is brokering a submission for another institute, the submitter should use their special broker account name in broker_name while the data centre acronym remains in center_name. Log-in details should have been provided when you requested a submission account. Please contact our Helpdesk team if you have any questions. run_center: Many submitting centers contract out the actual sample sequencing to another center. In these cases, the sequencing center should be acknowledged in the run_center attribute. Again, this is controlled vocabulary and the acronym should be sought from EGA helpdesk before submitting. Please contact our Helpdesk team if you have any questions. Prepare your XMLs The goal of this section is to provide sufficient information to be able to create the metadata XML documents required for programmatic submissions. Please note, the EGA utilises the XML schemas maintained at the European Nucleotide Archive (ENA). It is important due to the fact that by using a similar system, some pieces of documentation from the ENA's programmatic submission can also help you with your programmatic submission to the EGA. For example, you can submit programmatically without using a Submission XML by following the steps at Submission actions without submission XML. A submission does not have to contain all different types of XMLs. For example, it is possible to submit only a few samples; or a study that is later to be referenced. You can submit each object one by one, or submit all in a batch: you choose what method of submission works best for you. We do recommend, nevertheless, that you submit the objects to be referenced (e.g. samples or studies) first, and the objects that reference these (e.g. experiments or datasets) afterwards. You can see a graphical view of these objects and their relationships at our EGA Schemas page. Independently of the submission scenario, you will always require a Dataset XML. The entity of a dataset is what is used to control access to the given data, in the form of runs or analyses. In other words, when a requester is granted access, it is through the dataset and the objects (e.g. runs or analyses) that the dataset contains, granting access to them in one go. Given the nature of the EGA, a dataset XML will always be required for the data access. First, we will differentiate between submissions of "raw" and "processed" data: Runs and Analyses, respectively. Run data submissions Raw data derives from instruments "as is". For example, a plain sequence file (e.g. FASTQ or unaligned BAM files) would be considered raw data. A typical raw (unaligned) sequence read submission consists of 8 XMLs: Submission Study Sample Experiment Run DAC Policy Dataset When technical reads (e.g. barcodes, adaptors or linkers) are included in the submitted raw sequences, a spot descriptor must be submitted to describe the position of the technical reads so that they can be removed. The following data files can be submitted without providing spot descriptor information in the experiment/run XML: BAM files (single reads) SFF files (single reads without barcodes) FastQ files (single reads without any technical reads) Complete Genomics files Analysis data submissions Processed data is, in some way, refined raw data. This includes raw data that has been processed by some form of analysis method (e.g. alignment, noise reduction, etc.). For example, an aligned sequence (e.g. BAM file), that was created using raw FASTQ files, would be a processed file. This category includes most types of data: sequence alignment files (e.g. BAM or CRAM), clinical data (e.g. phenopackets), sequence variation files (e.g. VCF), sequence annotation, etc. A typical EGA analysis data submission consists of 7 EGA XML: Submission Study Sample Analysis DAC Policy Dataset We accept three different types of analysis data submissions: BAM files (for multiple read alignments) VCF files (for sequence variations) Phenotype files (in any format) In anycase, keep in mind that samples must be created in order to be referenced in the analyses. In other words, the provenance of the information within the BAM, VCF and phenotype files Example XMLs Below you can find a non-extensive list of example XMLs with descriptive fields (i.e. explaining what to provide in each field). Furthermore, you can also find real examples (i.e. the true value of the provided fields) in our GitHub repository. Submission XML The submission XML is used to validate, submit or update any number of other objects. The submission XML refers to other XMLs. New submissions use the ADD action to submit new objects. Object updates are done using the MODIFY action and objects can be validated using the VERIFY action. Descriptive submission XML example True values submission XML example Study XML The study XML is used to describe the study containing a title, a study type and abstract as it would appear in a publication. Descriptive study XML example True values study XML example Please use the following notation within the property "STUDY_LINKS" when including PubMed citations in the Study XML: <STUDY_LINKS> <STUDY_LINK> <XREF_LINK> <DB>PUBMED</DB> <ID>18987735</ID> </XREF_LINK> </STUDY_LINK> </STUDY_LINKS> Sample XML The sample XML is used to describe the samples used to obtain the data, whether they were sequenced, measured in any other way, or have an associated phenotype. The mandatory fields include information about the taxonomy of the sample, sex, subject ID and phenotype. For example, the mandatory attribute fields for each sample would look like these, within the array of "SAMPLE_ATTRIBUTES": <SAMPLE_ATTRIBUTES> <SAMPLE_ATTRIBUTE> <TAG>subject_id</TAG> <VALUE>free text!</VALUE> </SAMPLE_ATTRIBUTE> <SAMPLE_ATTRIBUTE> <TAG>sex</TAG> <VALUE>female/male/unknown</VALUE> </SAMPLE_ATTRIBUTE> <SAMPLE_ATTRIBUTE> <TAG>phenotype</TAG> <VALUE>Free text, EFO terms (e.g. EFO:0000574) are recommended</VALUE> </SAMPLE_ATTRIBUTE> </SAMPLE_ATTRIBUTES> Sample is one of the most important objects to be described biologically, it is highly recommended that “TAG-VALUE” pairs are generated as SAMPLE_ATTRIBUTES to describe the sample in as much detail as possible. For example, were we to give the population ancestry of the sample, we could add a new attribute to the array, in which, for example, we would indicate that the sample derives from an individual of "Mende in Sierra Leone" (MSL), with an african ancestry: <SAMPLE_ATTRIBUTE> <TAG>Population</TAG> <VALUE>MSL</VALUE> </SAMPLE_ATTRIBUTE> Given that VALUE and TAG are free text, the combinations are limitless in order to give you full flexibility on the information you want to provide. We recommend you use the Experimental Factor Ontology (EFO) to describe the phenotypes of your samples. You can provide more than one phenotype by adding more items to the array of SAMPLE_ATTRIBUTES. Phenotypes considered essential for understanding the data submission should be provided. Each phenotype described should be listed as a separate sample attribute <SAMPLE_ATTRIBUTE> </SAMPLE_ATTRIBUTE>. There is no limit to the number of phenotypes that can be submitted. If a suitable EFO accession cannot be found for your phenotype attribute, please consider using another controlled ontology database (e.g. HPO, MONDO, etc.) before using free text. Descriptive sample XML example True values sample XML example Experiment XML The experiment XML is used to describe the experimental setup, including instrument platform and model details, library preparation details, and any additional information required to correctly interpret the submitted data. Where any of these values differ between runs, a new experiment object must exist, since runs are grouped by experiments. Each experiment references a study and a sample by alias, or if previously-submitted, by accession. Pooled data must be demultiplexed by barcode for submission. Descriptive experiment ( Illumina paired read ) XML example True values experiment ( Illumina paired read ) XML example Run XML The run XML is used to associate data files with experiments and typically comprises a single data file (e.g. a FASTQ file). Please note that pooled samples should be de-multiplexed prior submission and submitted as different runs. Descriptive run XML example True values run XML example Analysis XML Given that an analysis can be used to submit any type of processed data to the EGA, we will list below an example of each of the three most common types of analysis XMLs submitted to the EGA: sequence alignments (e.g. BAM files); sequence variation (e.g. VCF files); and clinical metadata or phenotypes (e.g. phenopackets). Regardless of the type of processed data submitted in the analysis, the analysis must be associated with a Study and can reference multiple types of other objects, from samples to experiments, if they are available at the EGA. Just like with Runs, whenever a file is submitted to the EGA through an analysis object, the file MD5 checksums must be present, in order for the EGA to validate file integrity upon transfer. This also includes index files when applicable (e.g. .bai.md5 files). Ideally, any analysis that uses a reference sequence for some kind of alignment (e.g. BAM, CRAM or VCF files), would contain metadata about the alignment, such as INSDC reference assemblies and sequences, by either using accessions (e.g. CM000663.1) or common labels (e.g. GRCh37). Read alignment (BAM) Analysis XML The Analysis can be used to submit BAM alignments to EGA. Only one BAM file can be submitted in each analysis and the samples used within the BAM read groups must be associated with Samples. Descriptive bam alignments XML example True values bam alignments XML example Sequence variation (VCF) Analysis XML The Analysis can be used to submit VCF files to EGA. Only one VCF file can be submitted in each analysis and the samples used within the VCF files must be associated with Samples. Download analysis XML (VCF) Phenotype files The Analysis XML can be used to submit phenotype files to the EGA. Only one phenotype file can be submitted in each analysis and the samples used within the phenotype files must be associated with EGA Samples. Download analysis XML (Phenotype) DAC XML The DAC XML describes the Data Access Committee (DAC) affiliated to the data submission. The DAC may consist of a group or a single individual and is responsible for the data access decisions based on the application procedure described in the POLICY.XML. As with any other object, if it was already submitted to the EGA, there is no need to submit it again: you can reference an existing object within the EGA. Hence, A DAC XML does not need to be provided if your submission is affiliated to an existing EGA DAC.. Further information on DACs can be found here, and you can always contact our Helpdesk team if you have further inquiries. Descriptive dac XML example True values dac XML example Policy XML The Policy XML describes the Data Access Agreement (DAA) to be affiliated to the named Data Access Committee. Descriptive policy XML example True values study XML example Dataset XML The dataset XML describes the data files, defined by the Run.XML and Analysis.XML, that make up the dataset and links the collection of data files to a specified Policy. The dataset xml is commonly the last metadata object to be submitted, since it references multiple other entities. Please consider the number of datasets that your submission consists of. For example, a case-control study is likely to consist of at least two datasets. In addition, we suggest that multiple datasets should be described for studies using the same samples but different sequence technologies. Descriptive dataset XML example True values dataset XML example Validating and submitting your EGA Validating EGA's XMLs through Webin After you have ensured that the XMLs are properly formatted and contain all the required information. You can proceed to validate and submit your data. Use the curl command to validate your XML file: Once you have prepared your XML file and asserted you have access to Webin, you can validate your XML file programmatically against EGA's schemas using the curl command. There are multiple ways in which you can validate your XMLs. This variety has to do with the fact that: (1) there are 2 instances of Webin (test and production); and (2) that validation is a default step during submission. In other words, any time that you submit your data through Webin, it will be validated automatically before being accepted. This allows for 4 possible routes of validation, all having the same validation result: validating or submitting to either the production service or the test service of Webin. For example, directly validating a "study" object XML in the testing service (wwwdev…) would look like the following: curl -u <USERNAME>:<PASSWORD> -F "ACTION=VALIDATE" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" -F "STUDY=@study.xml" In this command, you would need to replace <USERNAME> and <PASSWORD> with your EGA account username and password, respectively. You would also replace <INPUT_FILE> with the path to your XML file. A mock example would look like the following: curl -u ega-test-data@ebi.ac.uk:egarocks -F "ACTION=VALIDATE" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" -F "STUDY=@study.xml" The validation attempt can have different results depending on the given arguments: If your XML file is valid according to EGA's schemas, you will see a message indicating that your XML file is compliant. For example, see below for our mock example, where the "success" was "true" (i.e. no validation errors found). Nevertheless, notice how the "<STUDY accession=" is empty: it is because we were simply validating, so the study did not get an accession or ID. <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="receipt.xsl"?> <RECEIPT receiptDate="2023-04-11T15:19:28.850+01:00" submissionFile="submission-EBI-TEST_1681222768850.xml" success="true"> <STUDY accession="" alias="Mock example" status="PRIVATE"/> <SUBMISSION accession="" alias="SUBMISSION-11-04-2023-15:19:28:840"/> <MESSAGES> <INFO>VALIDATE action has been specified.</INFO> <INFO>Submission has been rolled back.</INFO> <INFO>This submission is a TEST submission and will be discarded within 24 hours</INFO> </MESSAGES> <ACTIONS>VALIDATE</ACTIONS> <ACTIONS>PROTECT</ACTIONS> If there are any errors or warnings, the tool will display them, allowing you to correct them before submitting your data to EGA. For example, in the following response, it is said that the object we were trying to submit was already existing, and therefore the "success" was "false". <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="receipt.xsl"?> <RECEIPT receiptDate="2023-04-11T15:12:35.609+01:00" submissionFile="submission-EBI-TEST_1681222355609.xml" success="false"> <STUDY alias="Example!_Human Microbiome Project SP56J" status="PRIVATE" holdUntilDate="2023-03-11Z"/> <SUBMISSION alias="SUBMISSION-11-04-2023-15:12:35:576"/> <MESSAGES> <ERROR>In study, alias: "Example!_Human Microbiome Project SP56J". The object being added already exists in the submission account with accession: "ERP127584".</ERROR> <INFO>VALIDATE action has been specified.</INFO> <INFO>Submission has been rolled back.</INFO> <INFO>This submission is a TEST submission and will be discarded within 24 hours</INFO> </MESSAGES> <ACTIONS>VALIDATE</ACTIONS> <ACTIONS>PROTECT</ACTIONS> If the curl command retrieves no response at all, please double check if your username and password are correctly provided. Also notice the "ACTION=..." argument passed to the Curl command. This specifies the action to take during the call to Webin, so we do not need a "Submission" XML just for a validation attempt. See more at submission actions without submission XML. Furthermore, validation of multiple files or objects (e.g. sample, experiment, study…) can be done in a single command by adding more arguments (i.e. '-F'). For example: curl -u <USERNAME>:<PASSWORD> -F "ACTION=VALIDATE" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" -F "STUDY=@study.xml" -F "SAMPLE=@sample.xml" -F "DATASET=@dataset.xml" As mentioned above, beside "validate" action in the test environment, you can also validate your metadata by three other methods: "Validate" in the production server. From our example above, you simply need to take the "dev" away from the URL. curl -u <USERNAME>:<PASSWORD> -F "ACTION=VALIDATE" "https://www.ebi.ac.uk/ena/submit/drop-box/submit/" -F "STUDY=@study.xml" "Add" in the development server. From our example above, you would simply need to replace the action: from "validate" to "add". Whatever is submitted to this service will be discarded in 24h, so whether something gets submitted or not would not matter in the long run. curl -u <USERNAME>:<PASSWORD> -F "ACTION=ADD" "https://wwwdev.ebi.ac.uk/ena/submit/drop-box/submit/" -F "STUDY=@study.xml" "Add" in the productionserver. A combination of the previous two methods, which would render this attempt into a submission. This path is just to be taken when you are sure your metadata is compliant and what you want to submit. curl -u <USERNAME>:<PASSWORD> -F "ACTION=ADD" "https://www.ebi.ac.uk/ena/submit/drop-box/submit/" -F "STUDY=@study.xml" What happens after the submission of a dataset XML? Once you have completed the registration of your dataset/s please contact our Helpdesk Team to provide a release date for your study. Please note that all datasets affiliated to unreleased studies are automatically placed on hold until the authorised submitter or DAC contact contact the EGA Helpdesk for the study to be released. We strongly advise you not to delete your data until EGA Helpdesk confirms that your data has been successfully archived.