Cancer in children is uncommon and the overall prognosis for most pediatric cancers is good. However, while the combined survival rates have improved over the last decades, certain childhood malignancies, such as high-grade gliomas or metastatic sarcomas, still remain incurable in most patients. New strategies for targeting those devastating diseases are imperative, and patient genomic data can become a key asset in the process [1]. The Pediatric Cancer Genome Project’s datasets At EGA, we store over 500 datasets with sequencing data from pediatric cancer patients. A remarkable case is the datasets belonging to the Pediatric Cancer Genome Project (PCGP) from St. Jude Children’s Research Hospital–Washington University (Table 1 - EGAS list). PCGP is an ambitious effort to identify the mutations that drive childhood cancer and find new cures. Those datasets include 600 patients with complete tumor and normal genomes from 15 different tumor types. PCGP datasets have been an important resource for various studies that pooled genomic data from different public and in-house datasets to perform extensive genomic characterizations and publish comprehensive pan-cancer pediatric studies [2],[3]. Other interesting pediatric cancer datasets are from the Pediatric Brain Tumor Consortium, Sickkids or ICGC PedBrain project. How data reuse impacts drug development for pediatric cancers Recently, in October 2024, Nature Communications published a work led by the University of Michigan where authors used data from EGA (PCGP) check the table below as part of their strategy to pursue the identification of new tumor vulnerabilities susceptible to becoming novel therapeutic targets in diffuse midline glioma. Diffuse midline gliomas (DMG) are treatment-resistant and uniformly fatal pediatric brain tumors. The prognosis of this brainstem tumor is dismal with a median overall survival of 9–12 months from diagnosis. In this study, the authors first developed a lab model of the disease and reanalyzed data from the EGA to identify genes potentially linked to tumor severity. This approach revealed involvement of specific metabolic pathways, suggesting new possibilities for therapeutic intervention. In mice, the study shows that the use of statins can improve survival. For a brief overview of the underlying science, diffuse midline gliomas present intratumor heterogeneity with subpopulations of less-differentiated oligodendrocyte precursors and more differentiated astrocytes. Authors established in vitro models to recapitulate both phenotypes and identify metabolic programs in both subpopulations. To determine the clinical relevance, the authors re-used gene expression data from 76 DMG patients identifying a gene signature predicting decreased overall survival. After extensive metabolic characterization of subpopulations, authors defined strategies to target specific metabolic vulnerabilities. Pre-clinical experiments in mice using OXPHOS inhibitors and statins showed a reduction in tumor burden and increased overall survival [4]. In this scenario, the re-use of high-quality patient’s sequencing data appeared as an advantageous strategy to support the clinical relevance of in-vitro and pre-clinical findings. On the other hand, in rare conditions such as childhood cancer, the availability of public datasets and the possibility of pooling data from different sources can be the only way to obtain impactful results that lead to improvements for patients. References Davidoff, A. M. Pediatric Oncology. Seminars in pediatric surgery 19, 225–233 (2010). Gröbner, S. N. et al. The landscape of genomic alterations across childhood cancers. Nature 555, 321–327 (2018). Venu Thatikonda et al. Comprehensive analysis of mutational signatures reveals distinct patterns and molecular processes across 27 pediatric cancers. Nature Cancer 4, 276–289 (2023). Mbah, N. E. et al. Therapeutic targeting of differentiation-state dependent metabolic vulnerabilities in diffuse midline glioma. Nature Communications 15, (2024). Datasets and Studies ID Title Access Policy Year EGAD00001000134 Driver mutations in histone H3.3 and chromatin remodelling genes in paediatric glioblastoma McGill-DKFZ Pediatric Brain Tumour Consortium 2016 EGAS0000100192 Somatic Histone H3 Mutations in Diffuse Intrinsic Pontine Gliomas and Non-Brainstem Paediatric Glioblastomas St. Jude Children's Research Hospital - Washington University Pediatric Cancer Genome Project 2017 EGAS00001000575 Whole genome sequencing and whole exome sequencing of DIPG tumors and matched normal tissue The hospital for sick children ("SickKids") 2016 EGAD00001000792 Exome sequencing reads of paediatric glioblastoma McGill-DKFZ Pediatric Brain Tumour Consortium 2016 EGAD00001002006 Whole genome sequencing of paediatric glioblastoma in the ICGC PedBrain project ICGC PedBrain project 2016
Metadata Distribution Welcome to the realm of Metadata Distribution within the EGA ecosystem! Our Metadata REST API empowers you to effortlessly retrieve metadata from the expansive landscape of EGA. By utilising this API, you gain access to publicly available insights across various EGA domains, including studies, samples, experiments, runs, analyses, policies, DACs, and datasets. Furthermore, this API facilitates cross-referencing of objects, enabling you to gather, for example, all the datasets associated with a specific DAC, seamlessly. In addition, we have added the ability to query private data using the metadata API. If you possess the necessary permissions, you can access behind-the-login private data for a specified list of datasets. Metadata Distribution Index Identifiers Dataset Mappings Website Download Metadata API - Private Identifiers At the core of EGA's organisational structure are unique accessions that serve as essential tags for our diverse objects. Here's a quick overview of the accessions and their corresponding object types: EGA Accession ID EGA Object description EGAS EGA Study Accession ID EGAC EGA DAC Accession ID EGAP EGA Policy Accession ID EGAN EGA Sample Accession ID EGAR EGA Run Accession ID EGAX EGA Experiment ID EGAZ EGA Analysis Accession ID EGAD EGA Dataset Accession ID EGAB EGA Submission ID EGAF EGA File Unique Accession ID For further information check our metadata schema documentation. Dataset Mappings For authorised datasets, comprehensive mappings reveal meaningful connections: Sample_file: This file presents information about the linkage between samples and files available in the dataset. Study_experiment_run_sample: This file presents information about the linkage between studies, experiments, runs, and samples within the dataset. Study_analysis_sample: This file presents information about the linkage between studies, analyses, and samples contained within the dataset. Run_sample: This file presents information about the linkage between runs and samples within the dataset. Analysis_sample: This file presents information about the linkage between analyses and samples within the dataset. An empty file indicates the absence of corresponding information. Website Download Our website serves as your gateway to downloading metadata. Simply navigate to the dataset page, and you'll find a blue Metadata button. Once authenticated, you can click this button for authorised datasets. If you lack permissions for a particular dataset, request access by clicking the 'Request Access' button. For authorised datasets, choose your preferred metadata format: CSV, TSV, or JSON. Metadata API - Private Leverage the power of programmatic metadata downloads! Start by authenticating yourself with your credentials to obtain an access token. With this token, programmatically query private information. Queries mirror the structure of the Public Metadata API. However, behind the login, you can delve into specific mapping information (as mentioned above in dataset mappings) alongside object-level exploration. Authentication An active session is required to work with the API. Each time you log in with your credentials a new session is started, which is identified by an access_token. Below an example on how to obtain one using curl: curl https://idp.ega-archive.org/realms/EGA/protocol/openid-connect/token \ -d 'client_id=metadata-api' \ -d 'username=...' \ --data-urlencode 'password=...' \ -d 'grant_type=password' All responses from the API are in JSON format. A successful response should include a new token to be used for the session: {"access_token":"eyJhbGciOiJSUzI1NiIsInR5cCIgOiA...TNw", "expires_in":300, "refresh_expires_in":1800, "refresh_token":"eyJhbGciOiJIUzI1NiIsInR5cCIgOiA...pTX10", "token_type":"Bearer", ... } Save the access_token value and include it in the API call headers. Example query usage Below you can find some example of queries available behing authentication and authorisation. Querying study-experiment-run-sample mappings: curl https://metadata.ega-archive.org/datasets/{datasetID}/mappings/study_experiment_run_sample \ -H 'Authorization: Bearer access_token' Querying run-sample mappings: curl https://metadata.ega-archive.org/datasets/{datasetID}/mappings/run_sample \ -H 'Authorization: Bearer access_token' Querying study-analysis-sample mappings: curl https://metadata.ega-archive.org/datasets/{datasetID}/mappings/study_analysis_sample \ -H 'Authorization: Bearer access_token' Querying analysis-sample mappings: curl https://metadata.ega-archive.org/datasets/{datasetID}/mappings/analysis_sample \ -H 'Authorization: Bearer access_token' Querying sample-file mappings: curl https://metadata.ega-archive.org/datasets/{datasetID}/mappings/sample_file \ -H 'Authorization: Bearer access_token' You can get a different output format by adding one of these options to the curl command: -H 'Accept: text/tsv' -H 'Accept: application/json' -H 'Accept: text/csv' For more detailed information, refer to the Metadata API Specification.
Illumina Infinium MethylationEPIC Array profiling of 93 pheochromocytoma and paraganglioma tumours with and a germline SDHB mutation
A discovery cohort of 856 adult survivors of pediatric ALL
Cambridge control samples using a 1.2M genotyping chip from Illumina
Freezing replicates
Technical replicates
A OB56_N_PreA_WGBS paired end data for Preadipocytes(fat)