Human placenta samples from 52: 5 first trimester , 7 second trimester, and 40 term placenta. Data is uploaded as BAM files.
A deeper understanding of the pathological mechanisms of SARS-CoV-2 infection is required to combat COVID-19. Through this dataset, we analyze postmortem lung cells from patients that are infected/uninfected with SARS-CoV-2 with snRNA-seq.
TCRab sequencing was performed on viably frozen cells from 11 T-LGLL samples from 9 T-LGLL patients and 6 age-matched healthy samples. The raw data is available as fastq files.
This dataset contains the methylation sequencing data of 60 nonCancer and 70 colorectal cancer cfDNA samples. The methylation library is constructed by using NEBNext Enzymatic-seq Kit.
Single-cell RNA sequencing of 18 peripheral blood samples from six melanoma patients. The raw data is available as fastq files.
This dataset is derived from whole-transcriptome sequencing (RNA-seq) of RNA from 57 BCR-ABL1 lymphoblastic leukemias (53 diagnostic, 4 relapse).
The objective of the colonoscopy study is to carry out the 16s sequencing of colon biopsies and faecal samples provided in ExHiBITT study to compare potential fluctuations in the microbiota of different sites.
Projects Jointly managed by the European Bioinformatics Institute (EMBL-EBI) in Cambridge (UK) and the Centre for Genomic Regulation (CRG) in Barcelona, the EGA provides an invaluable service to the worldwide biomedical research community. The teams leading the EGA are involved in several international partnerships and consortia in numerous scientific fields, where they contribute to ambitious projects. In addition to the project listed below, the EGA is in a long-standing partnership with the Global Alliance for Genomics and Health (GA4GH), as described on the dedicated page. On-going projects Project Duration Domain Funder Tags CANDLE | CANDLE project aims to conceptualise and advance the development of National Cancer Data Nodes (NCDNs) in European countries. These NCDNs will boost the reuse of cancer data for research, innovation and policy making, in order to improve diagnostics and treatment for cancer patients, as well as prevention and early detection. 2025-2028 Cancer Horizon Europe DOCUMENTATION EASIGEN-DS | The EASIGEN-DS project aims to conduct a design study to establish a new European Research Infrastructure on Advanced Genomics Technologies, EASIGEN. To develop an excellent scientific, technological and operational design, we will conduct landscape studies, stakeholder consultations, and community surveying. 2025-2028 Genomic and health data Horizon Europe DATA MANAGEMENT DOCUMENTATION INFRASTRUCTURE Go-IMPaCT | Go-IMPaCT will contribute sequenced genomes and provide infrastructure as part of IMPaCT-Cohort, one of the three fundamental pillars of the Precision Medicine Infrastructure associated with Science and Technology (IMPaCT) program in Spain. Along with the Genome of Europe (GoE) project, around 18.000 people will have their genomes sequenced, also contributing to Spain's commitments in 1+MG. Go-IMPaCT will fund the development of an EGA node to manage and share this genomic and phenoclinic data, laying the foundations for regional and ethnic genomic variability in Spain to be available for research purposes. The IMPaCT cohort is created with the spirit of being an open research tool, compatible with the rest of the health research ecosystem, and other international initiatives. 2025-2027 Large-scale genomics and health data; personalised medicine Instituto de Salud Carlos III ACCESS DISCOVERY INFRASTRUCTURE METADATA STANDARDS FAIR-FEGA | This project seeks to accelerate data depositions into FEGA, significantly increasing the data flow in and from FEGA nodes. It will build capacity within the FEGA nodes and increase awareness in a wide range of stakeholders, thus altogether achieving the ultimate goal of enhancing data reuse. The project will be carried out by a strategic consortium comprising seven ELIXIR nodes and two ELIXIR communities. 2025-2026 Not applicable ELIXIR ACCESS DISCOVERY DOCUMENTATION INFRASTRUCTURE METADATA STANDARDS FEGA-Connect | A consortium of six ELIXIR nodes plus the Polish FEGA node (in-kind contribution) joining forces to build a solid base to develop solutions for effective multi-omic sensitive data integration between FEGA nodes and other infrastructures and specialised Data repositories. We aim to promote a more coherent data deposition, discoverability and retrieval of multi-omics datasets, providing FAIRer data and consequently accelerating research. 2025-2026 Multi-omics data ELIXIR ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE METADATA STANDARDS IMPaCT-Data 2 | IMPaCT-Data 2 will develop a digital platform for the integration and modelling of biomedical data associated with IMPaCT (Precision Medicine Infrastructure associated with Science and Technology) projects in Spain. It will deploy a sustainable infrastructure that facilitates the integration, standardisation, interoperability and analysis of clinical, genomic, molecular and medical imaging data. This platform will be aligned with European projects such as Genome of Europe (GoE), the first project to make use of the European Genomic Data Infrastructure (GDI), and EUCAIM. IMPaCT-Data 2 will benefit from advanced Artificial Intelligence and High Computing Capacity Systems capabilities, offering robust and accessible tools for researchers from the National Health System in Spain. 2025-2026 Large-scale genomics and health data; personalised medicine Instituto de Salud Carlos III ACCESS DISCOVERY INFRASTRUCTURE METADATA STANDARDS SenSec | This project aims to establish a mechanism for orchestrating secure access to sensitive data hosted by the EGA, whether in Central EGA or any Federated Node, from Galaxy, a popular open-source, community-driven VRE (Virtual Research Environment) for bioinformatics analysis. Building on a previous prototype that enabled Galaxy users within Trusted Research Environments (TREs) to decrypt sensitive data for workflow execution without sharing private encryption keys, SenSec will expand this prototype into a comprehensive solution for secure data analysis in Galaxy, facilitating encrypted data access and transfer from FEGA/EGA repositories to designated TREs. 2025-2026 Genomic and health data; trusted research environment ELIXIR ACCESS DATA ANALYSIS ERDERA | The European Rare Disease Research Alliance (ERDERA) takes over EJPRD to deliver concrete health benefits to rare disease patients in the next decade by advancing prevention, diagnosis and treatment research. To leave no one behind, over 170 organisations championed by the European Union and member states are working hand in hand to make Europe a world leader in rare diseases research and innovation. 2024-2034 Rare diseases Horizon Europe; "La Caixa" Foundation cofunds CRG's contribution ACCESS DATA ANALYSIS DISCOVERY INFRASTRUCTURE SYNTHIA | The aim of SYNTHIA is to deliver validated, reliable tools and methods for synthetic data generation (SDG). The tools will cover multiple data types including lab results, clinical notes, genomics, imaging and m-health data. SYNTHIA also hopes to make possible the generation of longitudinal data. 2024-2029 Genomic and health data; multi-omics; AI solutions Innovative Health Initiative (IHI) DATA ANALYSIS DATA MANAGEMENT INFRASTRUCTURE GoE | The Genome of Europe initiative aims to build a European network of national genomic reference cohorts of at least 500.000 citizens. These reference cohorts will be selected to be representative of the European population. 2024-2028 Large-scale genomic and health data Horizon Europe ACCESS DISCOVERY INFRASTRUCTURE METADATA STANDARDS HEREDITARY | HEREDITARY aims to transform the way we approach disease detection, prepare treatment response, and explore medical knowledge by building a robust, interoperable, trustworthy, and secure framework that integrates multimodal health data (including genetic data) while ensuring compliance with cross-national privacy-preserving policies. 2024-2027 Neurodegenerative disorders, gut-brain interplay Horizon Europe DATA MANAGEMENT DATA ANALYSIS EOSC-ENTRUST | The mission of EOSC-ENTRUST is to create a European network of trusted research environments for sensitive data and to drive European interoperability by joint development of a common blueprint for federated data access and analysis. 2024-2026 Trusted Research Environment Horizon Europe INFRASTRUCTURE EBV-MS | "Targeting Epstein-Barr Virus Infection for Treatment and Prevention of Multiple Sclerosis". The ambitious goals of the project are to answer the questions why only a few EBV infected persons develop MS, and define the underlying mechanism of this process, as well as clarify if targeting the EBV infection can prevent MS or improve the disease course. 2023-2028 Viral-host genetics; immune response; disease modelling; disease prevention; AI/ML solutions Horizon Europe DATA MANAGEMENT DATA ANALYSIS WISDOM | WELL-BEING IMPROVEMENT THROUGH THE INTEGRATION OF HEALTHCARE AND RESEARCH DATA AND MODELS WITHOUT BORDER FOR CHRONIC IMMUNE-MEDIATED DISEASES aims to deploy novel approaches for data processing, harmonisation, management, and secure data sharing and federated access for diseases like multiple sclerosis. Using an end-user guided approach, it will facilitate responsible and critical assessment of the use of AI in healthcare. 2023-2028 Chronic immune-mediated diseases Horizon Europe DATA MANAGEMENT INFRASTRUCTURE EUCAIM | EUropean Federation for CAncer IMages is a project that will build a highly secure, federated and large-scale European cancer imaging platform, with capabilities that will greatly enhance the potential of Artificial Intelligence in oncology. 2023-2027 Cancer Digital Europe Programme (DIGITAL) DISCOVERY CONTAGIO | CONTAGIO (COhorts Network To be Activated Globally In Outbreaks) aims to create coordination mechanisms to rapidly react to infectious disease (re-)emergence in low- and middle-income countries (LMICs). 2023-2026 Infectious Diseases European Commission - Horizon Europe ACCESS DATA MANAGEMENT DISCOVERY Youth-GEMs | Youth-GEMS (Gene Environment Interactions in Mental Health TrajectorieS of Youth) will conduct research into the genetic and environmental factors of mental health in young European people. 2022-2027 Mental health European Commission - Horizon Europe DATA MANAGEMENT DISCOVERY GDI | The European Genomics Data Infrastructure project is enabling access to genomic and related phenotypic and clinical data across Europe. It is doing this by establishing a federated, sustainable and secure infrastructure to access the data. 2022-2026 Genomic and health data European Commission - Horizon Europe; "La Caixa" Foundation cofunds CRG's contribution DISCOVERY DOCUMENTATION INFRASTRUCTURE IMPaCT-T2D | The IMPaCT-T2D project aims at studying the complete genomes of a large cohort of patients with Type 2 Diabetes mellitus (T2D), using modern sequencing technologies and artificial intelligence (AI) in order to improve the stratification and pharmacological treatment in the context of precision medicine. 2022-2025 Cardiovascular and Complex Diseases Spanish Ministry of Science and Innovation; Instituto de Salud Carlos III ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE Completed projects Project Duration Domain Funder Tags EOSC4Cancer | EOSC4Cancer builds on existing projects, research outcomes and established community solutions to create the federated FAIR data, analysis and services infrastructure needed for European Cancer research programmes. 2022-2025 Cancer European Commission - Horizon Europe DISCOVERY EuCanImage | A European Cancer Image Platform Linked to Biological and Health Data for Next-Generation Artificial Intelligence and Precision Medicine in Oncology. 2020-2025 AI Solutions in Oncology European Commission - H2020 Programme; "La Caixa" Foundation cofunds CRG's contribution DATA MANAGEMENT METADATA STANDARDS GenoMed4ALL | A consortium built to empower personalised medicine in the field of haematological diseases through the use of AI and the pooling of genomic and clinical data. 2020-2025 Hematological diseases European Commission - H2020 Programme DISCOVERY METADATA STANDARDS BY-COVID | The BeYond-COVID project aims to make COVID-19 data accessible to scientists in laboratories but also to anyone who can use it, such as medical staff in hospitals or government officials. Going beyond SARS-CoV-2 data, the project will provide a framework for making data from other infectious diseases open and accessible to everyone. 2021-2024 Infectious diseases European Commission - H2020 Programme ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE IMPaCT-Data | IMPaCT-Data aims to create the infrastructure for secondary use of data from Spanish healthcare systems - electronic health records, medical imaging and genomic repositories - and contribute with the knowledge and methodology produced to the healthcare system. 2021-2024 Large-scale genomics and health dataSpanish Ministry of Science and Innovation; Instituto de Salud Carlos III ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE LaMarato | It is a project aimed at creating and developing a catalan interhospitalary network to interrogate genetic variants from thousands of genetic tests carried out in patients with rare diseases from the main catalan hospitals. 2021-2024 Genomic and health data Fundacio La Marato de TV3 (catalan foundation) DISCOVERY HealthyCloud | This consortium will contribute a Strategic Agenda towards the European Health Research and Innovation Cloud. The project will work in collaboration with a broad range of stakeholders to ensure that all voices are included and that the results are technically and ethically sound. 2021-2023 Not Applicable European Commission - H2020 Programme DOCUMENTATION B1MG | Beyond 1 Million Genomes aims to create a network of genetic and clinical data across Europe. The project provides coordination and support to the 1+ Million Genomes Initiative (1+MG). This initiative is a commitment of 24 EU countries, the UK and Norway to give cross-border access to one million sequenced genomes by 2022. 2020-2023 Not applicable European Commission - Horizon Europe DATA MANAGEMENT INFRASTRUCTURE METADATA STANDARDS ELIXIR-CONVERGE | An alliance with the goal of Connecting and aligning ELIXIR Nodes to deliver sustainable FAIR life-science data management services. 2020-2023 Data Management and Infectious Diseases European Commission - H2020 Programme DATA MANAGEMENT INFRASTRUCTURE METADATA STANDARDS IHCC | The International HundredK+ Cohorts Consortium aims to create a global platform for translational research ? informing the biological and genetic basis for disease and improving clinical care and population health. 2020-2022 Translational research NIH; The Wellcome Trust; CZI INFRASTRUCTURE METADATA STANDARDS PPCG | The Pan Prostate Cancer Group aims to harmonise and interrogate Whole Genome DNA Sequence data generated around the world from over 2000 men with prostate cancer, with associated transcriptome and methylome data to include men from different clinical categories, and ethnicities. This project is about providing breakthrough advances through analysis of a very large series of Whole Genome DNA data from prostate cancer contributed by many of the leading scientists and clinicians working in prostate cancer genomics. 2019-2024 Cancer Cancer Research UK DATA MANAGEMENT CINECA | Consortium providing a Federated solution enabling population-scale genomic and biomolecular data accessible across international borders accelerating research and improving the health of individuals resident across continents. 2019-2023 Large-scale Genomics and Health Data European Commission - H2020 Programme ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE EASI-Genomics | A project designed to provide easy access to cutting-edge DNA sequencing technologies to researchers from academia and industry, within a framework that ensures compliance with ethical and legal requirements, as well as FAIR and secure data management. 2019-2023 Next Generation Sequencing European Commission - H2020 Programme ACCESS EJP-RD | An European consortium built to create a comprehensive, sustainable ecosystem allowing a virtuous circle between research, care, and medical innovation. 2019-2023 Rare diseases European Commission - H2020 Programme ACCESS DATA MANAGEMENT DOCUMENTATION METADATA STANDARDS EOSC-Life | EOSC-Life brings together the 13 Life Science research infrastructures (LS RIs) to create an open, digital and collaborative space for biological and medical research. The project will publish 'FAIR' data and a catalogue of services provided by participating RIs for the management, storage and reuse of data in the European Open Science Cloud (EOSC). 2019-2023 Not applicable European Commission - H2020 Programme DOCUMENTATION EUCANCan | A federated network aiming at implementing a cultural, technological and legal integrated framework across Europe and Canada, to enable and facilitate the efficient sharing of cancer genomic data. 2019-2023 Cancer European Commission - H2020 Programme DATA MANAGEMENT METADATA STANDARDS The Federated EGA framework: supporting sensitive data management across the ELIXIR Nodes | This project is a direct continuation of the FHD IS with the goal to position the FEGA framework as the core infrastructure driver to support human data sharing for research. 2019-2023 Human genomic data ELIXIR INFRASTRUCTURE UK Biobank | UK Biobank is a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants. This project is to archive whole genome sequencing and other genetic data for UK Biobank participants. 2019-2023 Large-scale Genomics and Health Data The Wellcome Trust; UKRI; Amgen; AstraZeneca; GSK; Johnson & Johnson DATA MANAGEMENT INFRASTRUCTURE VEIS | The core mission of VEIS is to create an open ecosystem of technologies that will address and adapt to the requirements of the systems used to analyse and interpret -omics and clinical data in research and application environments in biomedicine. The aim of the project is to leverage the value of the EGA for both industry and society. 2019-2022 Oncology and Rare diseases Generalitat de Catalunya and European Regional Development Fund (ERDF) ACCESS DISCOVERY ELIXIR BEACON IS | This study follows on from a number of earlier activities that have established the ELIXIR Beacon Project. The main aim is to extend the Beacon protocol, developed at EGA, to become the reference ELIXIR Data Discovery product 2019-2021 Not applicable ELIXIR DISCOVERY ELIXIR FHD IS | This project coordinates the delivery of FAIR compliant metadata standards, interfaces, and reference implementation to support the federated ELIXIR network of human data resources. 2019-2021 Human genomic data ELIXIR INFRASTRUCTURE ELIXIR Rare Disease | The Rare Disease Community extends and generalises the system of access authorisation and high volume secure data transfer developed within the EGA. The goal of the Community is to create a federated infrastructure that will enable researchers to discover, access and analyse different rare disease repositories across Europe. It is doing this in partnership with other European infrastructure projects, namely RD-CONNECT, BBMRI-ERIC and E-Rare.2019-2021 Rare diseases ELIXIR INFRASTRUCTURE Solve-RD | Solve-RD - solving the unsolved rare diseases - is a research project funded by the European Commission. It echoes the ambitious goals set out by the International Rare Diseases Research Consortium (IRDiRC) to deliver diagnostic tests for most rare diseases by 2020. The current diagnostic and subsequent therapeutic management of rare diseases is still highly unsatisfactory for a large proportion of rare disease patients - the unsolved RD cases. For these unsolved rare diseases, we are unable to explain the etiology responsible for the disease phenotype, predict the individual disease risk and/or rate of disease progression, and/or quantitate the risk of relatives to develop the same disorder. 2018-2024 Rare diseases European Commission - H2020 Programme ACCESS DATA MANAGEMENT METADATA STANDARDS EuCanShare | An EU-Canada joint infrastructure for next-generation multi-Study Heart research. 2018-2022 Cardiovascular Diseases European Commission - H2020 Programme ACCESS METADATA STANDARDS
The Type 2 Diabetes (T2D) Genetic Exploration by Next-generation sequencing in Ethnic Samples (T2D-GENES) Consortium is a collaborative international effort to identify genes influencing susceptibility to type 2 diabetes in multiple ethnic groups using next generation sequencing. To fulfill this objective, T2D-GENES Consortium undertook two large sequencing studies, called T2D-GENES Projects 1 and 2. Project 1 has carried out whole exome sequencing of 12,940 individuals, 6,504 with T2D and 6,436 non-diabetic controls, equally divided among five continental ancestry groups: Europeans, East Asians, South Asians, Hispanic Americans, and African Americans. The goal of Project 1 is to identify all genetic variants in the complete coding regions of the genomes (i.e., whole exome) by sequencing, including rare variants. Project 2 (i.e., SAMAFS substudy 2) is a pedigree-based study designed to identify low frequency or rare variants influencing susceptibility to T2D, using whole genome sequence information from approximately 600 individuals in 20 Mexican American T2D-enriched pedigrees from San Antonio, Texas, augmented with family-based imputation into approximately 440 additional family members. The major objectives of Project 2 are to identify low frequency or rare variants in and around known common variant signals for T2D, as well as to find novel low frequency or rare variants influencing susceptibility to T2D. Both T2D-GENES Projects 1 and 2 involve the San Antonio Mexican American Family Studies (SAMAFS), which are composed of four San Antonio, Texas-based family studies: the San Antonio Family Heart Study (SAFHS), San Antonio Family Diabetes/Gallbladder Study (SAFDGS), Veterans Administration Genetic Epidemiology Study (VAGES), and Family Investigation of Nephropathy and Diabetes - San Antonio (FIND-SA) Component and its extension called the Extended FIND [E-FIND]. The SAFHS and SAFDGS began in 1991 and have followed participants with extensive clinical phenotyping related to T2D for over 20 years. The VAGES was initiated in 1994 and a large battery of T2D-related phenotypic data has been obtained from its participants. The FIND-SA began in 2000, a part of the multicenter FIND study which was designed to identify genetic determinants of diabetic kidney disease; data from its participants related to T2D were used for this project. Non-overlapping subsets of SAMAFS participants are part of the T2D-GENES Projects 1 and 2, henceforth referred to as SAMAFS substudies 1 and 2, respectively. The SAMAFS substudies 1 and 2 are part of one of the five awards funded by NIDDK under a cooperative agreement award mechanism, which is governed by the Steering Committee of the T2D-GENES Consortium. Since Project 1 relies on population based subsets of cases and controls, 491 unrelated participants are drawn from the four SAMAFS as part of the T2D-GENES Project 1 Mexican American sample (i.e., SAMAFS substudy 1). The whole exome sequencing was performed at the Broad Institute. For Project 2, 1048 individuals are drawn from two SAMAFS (SAFHS and SAFDGS), representing 20 large families for substudy 2. The substudy 2 strategy is to sequence approximately 600 individuals at an average of 50x coverage across the entire genome, then impute genome wide genotypes for about 440 additional family members. The 600 sequenced individuals are specifically chosen for their value in imputing sequence information into other family members. By studying large pedigrees, we expect to find multiple individuals carrying each genetic variant, even if this variant is very rare in the population at large. Thus, a pedigree-based approach provides an excellent opportunity for identifying rare novel variants influencing risk of T2D and quantitative variation in T2D-related phenotypes. The whole genome sequencing has been done commercially by Complete Genomics, Inc. (CGI). The available sample of 1,048 includes 5 sequenced individuals who do not belong to any of the 20 large pedigrees. The final family data of 1,043 individuals includes whole genome sequence data for 607 individuals. After quality control, 590 sequenced individuals provide data for family based imputation using Merlin linkage analysis software into approximately 440 additional family members for whom chip based genotypes are available to indicate which parental haplotype is transmitted. The complete SAMAFS data including phenotype, genotype, sequence, other T2D-related trait data utilized for Projects 1 and 2 are available. These data can readily be viewed by clicking on the substudy title shown below or in the box: "Substudies", located on the right hand side of this parent or top study page phs000847.v1.p1, titled T2D-GENES Consortium: San Antonio Mexican American Family Studies (SAMAFS). phs000849 T2D-GENES Project 1: San Antonio Mexican American Family Studies (SAMAFS), Substudy 1: Whole Exome Sequencing phs000462 T2D-GENES Project 2: San Antonio Mexican American Family Studies (SAMAFS), Substudy 2: Whole Genome Sequencing in Pedigrees
RNA sequencing (RNAseq) of peripheral blood lymphocytes was used to develop a means to assess immune function in a way that can be used in discovery science and applied to patients individually in clinical settings. The premise is that profiles of RNA present in immune cells is reflective of the combined influence of genetic and environmental variation on immune potential of individuals and that this potential can be tapped to understand human immunity in a variety of biological contexts. CD4+ cells were isolated from fresh whole blood via positive magnetic bead selection and cell lysates were prepared using Qiazol (QIAGEN) and stored at -80ºC for 3 to 8 months. RNA was extracted in batches for cDNA library preparation and RNA-Seq. For this study, we developed standard operating procedures for handling human blood samples and determined: a) the best way to enrich for CD4+ T cells from whole blood and yield high quality RNA, b) the sensitivity of this RNA profiling strategy, and c) the reproducibility of generated immune profiles from healthy subjects. We then developed bioinformatics processes to establish immune response signatures and immune response phenotypes within cohorts of individuals.