Enabling discovery and access to sensitive data across national boundaries is vital for improving human health. It enables more powerful and efficient research by increasing the volume and diversity of data available for analysis. It allows us to better understand the causes of diseases – cancer, rare diseases, infectious diseases like COVID-19 – and develop new medicines and treatments. Sensitive human omics data are typically generated by research initiatives and shared using specialist repositories which provide services for data submission, discovery, and access. The EGA is one such repository. Established in 2008 at EMBL’s European Bioinformatics Institute (EMBL-EBI) in the UK, since 2012 the EGA (“Central EGA”) has been jointly managed by EMBL-EBI and the Centre for Genomic Regulation (CRG) in Spain. Many countries have emerging personalised medicine programmes which generate data from national initiatives. These programmes are driving a transition in human genomics from being research-driven to receiving funding through healthcare. Data generated in a clinical context are subject to stricter governance than research data and must follow national data protection legislation. To solve these challenges, the Federated EGA provides a network of connected resources to enable transnational discovery of and access to human omics data for research, while also respecting jurisdictional data protection regulations. In this way, the Federated EGA infrastructure supports the goals of European initiatives such as the 1+ Million Genomes initiative (1+MG), the European Health Data Space, and a number of EU-funded 1+MG implementation projects including Beyond 1 Million Genomes. The Federated EGA is made up of “Nodes” – typically nationally funded and operated – which store and manage data locally while allowing global discovery within the Federated EGA network. Since 2016, multiple parallel efforts, supported by ELIXIR and other transnational and national initiatives, have created technical and legal frameworks for establishing the Federated EGA. In 2022, the first five Nodes – Finland, Germany, Norway, Spain, and Sweden – officially joined the Federated EGA by signing Federated EGA Collaboration Agreements with Central EGA. The Finnish FEGA Node is operated by CSC - IT Center for Science and provides data management services according to national laws and the requirements of the EU General Data Protection Regulation (GDPR). These services provide tools and support for the whole life-cycle of sensitive research data from collating to analysis, publication, and authorised re-use. The development of the services has been a joint effort with the other Nordic nodes within NeIC's Tryggve and Heilsa projects, and funded by Finnish Ministry of Education and Culture and projects coordinated by ELIXIR Finland. Read more about Finnish FEGA signing. The German Human Genome-Phenome Archive (GHGA) strives to provide a national infrastructure as well as an ethical and legal framework that balances FAIR omics data usage and data protection needs for Germany. As a Germany-wide consortium funded by the German Research Foundation under the umbrella of the NFDI association, GHGA combines the expertise of 21 universities and research institutions to form a federated national infrastructure. Read more about GHGA signing. The German Human Genome-Phenome Archive (GHGA) strives to provide a national infrastructure as well as an ethical and legal framework that balances FAIR omics data usage and data protection needs for Germany. As a Germany-wide consortium funded by the German Research Foundation under the umbrella of the NFDI association, GHGA combines the expertise of 21 universities and research institutions to form a federated national infrastructure. Read more about GHGA signing. In Norway, a key component of the infrastructure is the services of The Sensitive Data Service (TSD) offered by USIT at University of Oslo. The Federated EGA Norway Node is developed by ELIXIR Norway and operated by the University of Oslo as the responsible legal entity. Core software modules are developed jointly with the other Nordic nodes in the NeIC Tryggve and Heilsa projects. Read more about FEGA Norway signing. The Spanish FEGA (es-FEGA) is a national service for storing sensitive biomedical data in Spain. Supported by the Spanish Institute of Bioinformatics (INB) in collaboration with Central EGA, sensitive research datasets are primarily hosted at the Barcelona Supercomputing Centre facilities. The Swedish Sensitive Data Archive is a secure data archive and sharing platform for sensitive datasets. It was developed by the National Bioinformatics Infrastructure Sweden (NBIS) in collaboration with other Nordic ELIXIR Nodes through the Tryggve and Heilsa projects funded by NeIC and coordinated with Central EGA through ELIXIR. Read more about the Swedish Node signing. The Swedish Sensitive Data Archive is a secure data archive and sharing platform for sensitive datasets. It was developed by the National Bioinformatics Infrastructure Sweden (NBIS) in collaboration with other Nordic ELIXIR Nodes through the Tryggve and Heilsa projects funded by NeIC and coordinated with Central EGA through ELIXIR. Read more about the Swedish Node signing. By providing a solution for secure and efficient management of human omics data, the Federated EGA aims to foster data reuse, enable reproducibility, accelerate biomedical research, and improve human health. Find out more Interested in setting up your own Federated EGA Node? Check out the FEGA Onboarding Knowledge Base for more information. The ELIXIR Federated Human Data Community is a great entry point for anyone interested in learning more about the Federated EGA. You can: Join the ELIXIR Federated Human Data Community mailing list (select “Human Data”) Attend the ELIXIR Federated Human Data Community calls
The Cell Plasticity and Regeneration Group at the Bellvitge Biomedical Research Institute-IDIBELL focuses on the process of recruitment of macrophages that takes place in the small intestine during injury and healing. They recently published a paper titled “Mucosal Macrophages Govern Intestinal Regeneration in Response to Injury" in Gastroenterology Journal. As part of the research, some experiments were conducted using human intestinal organoid lines. These cells were processed for RNA sequencing, and the sequencing data were deposited at the EGA to be made available to the scientific community. When dealing with human genomic information, repositories must ensure the availability of the datasets while ensuring data protection. In this context, the European Genome-phenome Archive stands as a service for secure archiving and sharing of genetic, phenotypic and clinical data resulting from biomedical research. Following the recent publication of their paper, we took the opportunity to talk to Ilias Moraitis, first author, and Jordi Guiu, group leader, to find out about their experience with data sharing. Could you explain the focus of your research? In the lab, we study intestinal regeneration and how immune cells participate in this process. We use several techniques: engineered mouse models, image tracing, as well as mouse and human cells intestinal organoids. What challenges do you face regarding data management? We didn’t have a lot of problems. Always the informatic part, the data analysis, can give problems. But everything was smooth and working. We think it’s very important to deposit the data in repositories. For the mouse data it is very straightforward, but for us it was the first time depositing human data, which is sensitive because it comes from patients, and this is legally regulated. That’s why we thought about the EGA, and it was our first time. Why do you think it is important to submit data to repositories such as EGA? Because we think data is important for the science. Nowadays, we are sequencing a lot everywhere worldwide and having these resources shared with the scientific community it’s not only good for science, but it is also saving money and reducing costs. And there is so much data in there that can be used for other projects and for other questions. And it saves time! How was the process of submitting the data to the EGA? The communication worked very well but it was slower than we would like. But I think this is something we learn on the way. People usually wait until the last minute to do this before publication but if you know it in advance, you can start the process earlier. If you had to repeat the process, would you do anything differently? It’s like everything, isn’t it? For the first time you don’t know, you aren’t sure, but for the second time you know the steps, you know what to do and it’s faster. Of course, we would to do it earlier. Also, there were some things related to the control access that we didn’t know how to manage, but it was something more internal to us. Who is going to receive the communications and when, is our ethical committee going to evaluate this? All these technicalities that we didn’t know before starting this process, so we had to think about all these things. Next time it will be faster on our side. Do you have any suggestions for us to improve the submission process? Maybe we’ll need to ask our bioinformatician who did it! He was in charge of this part. Besides that, we think if the process were faster, it would be better for everyone. Also, in science a lot of times you have to do a lot of things at the very last minute, because of the nature of the experiments, or the need to accomplish for submitting a paper. Being faster in the process would be a plus. On your side, we went through the process and were able to find some information, but there were aspects we weren't aware of and weren't sure how to handle. I'm not sure if other institutes have different protocols for managing data requests or how they handle these internally. We believe the main issue lies in how the forms are filled out. Do you have any recommendations for other submitters? The main advice is to do this with enough time. Submit the data when you have the sequences, and don’t wait until the last minute. And then it can be under embargo, so you don’t need to make it public at that moment. As soon as you sequence, upload it and then it will be there for whenever you need to publish. How do you think we could encourage other researchers to submit their data? There are different layers here. One is that it is mandatory. You have to do this. The other is that we belong to a research community, to the same community and sharing this is making our research community stronger and more efficient. And then, also because this gives you visibility. The data is there, other people can analyse your data, and this also will bring citations to your papers. It has many advantages.