Federated EGA Vision Statement
The Federated EGA is the primary global resource for discovery and access of sensitive human omics and associated data consented for secondary use, through a network of national human data repositories to accelerate disease research and improve human health.
Over the last 10 years, most individual-level human omics data have been generated in the context of research consortia and shared via global repositories such as the European Genome-phenome Archive (EGA). Many countries now have emerging personalized medicine programmes which are generating data from national or regional initiatives. Thus, human genomics is undergoing a step change from being a research-driven activity to one funded through healthcare initiatives.
Genetic data generated in a healthcare context is subject to more stringent information governance than research data and often must comply with national legislation. To address this need, the Federated EGA provides a network of connected resources to enable transnational discovery of and access to human data for research while also respecting jurisdictional data protection regulations. By providing a solution to emerging challenges around secure and efficient management of human omics and associated data, the Federated EGA fosters data reuse, enables reproducibility, and accelerates biomedical research.
The EGA project is currently a collaboration between EMBL-EBI and the CRG, regulated by agreements between the two institutions. The Federated European Genome-phenome Archive (EGA) will be a distributed network of repositories for sharing human -omics data and phenotypes. Typically a node would be an organization or project that hosts human genetic data so that the data can remain within a jurisdiction. Federated EGA gathers metadata of -omics data collections stored in national or regional archives and makes them discoverable across the EGA network.
EGA is contributing the Federated EGA model, requirements and experiences to several communities and projects like GA4GH, ELIXIR Federated Human Data Implementation Study or ELIXIR Federated Human Data community.
|Structure and Organization|
|EGA Federation: Structure and Organization||1.1||The structure of an EGA federated network and service expectations. We organise the EGA into three types of nodes: Central EGA, Federated EGA nodes and EGA Community nodes; we outline the goals of such an organization, and summarize the commitments and services provided by the nodes.|
|EGA Federation Strategic Committee||1.0||In the EGA Federation Strategic Committee terms of reference document we describe the purpose and objectives of the committee, which is to provide direction and strategic planning for the federated EGA project. The committee receives input from the EGA Strategic Committee and provides feedback for the EGA strategic roadmap.|
|EGA Federation Operations Committee||1.0||The EGA Federation Operations Committee terms of reference describes the purpose and objectives of the operations committee, which is to review operational performance and coordinate technical implementation roadmaps of EGA Federated and Community nodes. The committee receives advice from the EGA Federated Strategic Committee, and provides operational reporting to the EGA Federated Strategic Committee|
|Node Operations guidelines||1.0||The EGA Federated Node Operations gives an overview of the operational areas which require resources in order to create a federated EGA node. The document is based on more than 10 years experience of establishing and operating the EBI and CRG Central EGA nodes. It provides a breakdown of the operational areas of responsibility into Helpdesk Services, Technical Operations, Software Development, and IT Infrastructure.|
The LocalEGA is a federated storage software for sensitive data.
|Main LocalEGA software Repository|
|Main LocalEGA software Documentation|
Local EGA Software
A portable toolkit to securely deposit and share human sensitive data - Local EGA, Mini-Symposium Federated Human Data, Elixir All Hands Meeting, 2020
Federated EGA API's
Below is a list of the GA4GH standards and APIs implemented by the Federated EGA. Visit EGA-GA4GH for the full list that are currently available or planned for implementation at EGA.
|API Name||API Purposes||Status|| Supported
|htsgt||A protocol for secure, efficient and reliable access to sequencing read and variation data||Production||TBD|
|DUO||Semantically tag genomic datasets with usage restrictions, allowing them to become automatically discoverable based on a health, clinical, or biomedical researcher's authorization level or intended use.||Production||TBD|
|Researcher IDs||These specify the collection of researchers that may access the dataset at any given time, and the credentials they must supply||Production||TBD|
|Refget||Refget enables access to reference sequences using an identifier derived from the sequence itself.||Production||V1.0.0|
|Submission API||An API to submitt metadat following the INSDC object schemas. Used for EGA submissions||Production||TBD||swagger endpoint|
|Crypt4GH||A file container specification enabling direct byte-level compatible random access to encrypted genetic data stored in community standards such as SAM/BAM/CRAM/VCF/BCF.||Planned||TBD|
|Beacon||Discover genomic variants, individuals, and individuals||Development||TBD|
|Permissions API||Get/set of permissions to EGA objects||Development||TBD|