Genome and transcriptome sequence data from a unknown - likely pancreatobiliary / intrahepatic cholangiocarcinoma in liver patient, generated as part of the BC Cancer Agency's Personalized OncoGenomics (POG) study
The whole study comprises of two patient cohorts. Screening cohort: 40 patients of Germany; validation cohort: 40 patients from Asia. Further, bile duct and CCA cell lines have been analyzed. This dataset contains whole exome sequencing data of 37 tumor/normal pairs from the screening cohort plus an additional relapse tumor of one of those 37 patients. Data was generated on Illumina HiSeq 2000 device in paired-end mode and is stored in BAM file format.
The whole study comprises of two patient cohorts. Screening cohort: 40 patients of Germany; validation cohort: 40 patients from Asia. Further, bile duct and CCA cell lines have been analyzed. This dataset contains 44 samples derived from RNA-sequencing data of bile duct and CCA cell lines. Data was generated on Illumina NovaSeq 6000 device in paired-end mode and is stored in compressed FASTQ file format.
scRNAseq and scTCRseq of serial peripheral blood mononuclear cell (PBMC) samples (n=72) taken at various timepoints before and during treatment (Week 0 (W0), Week 3 (W3), Week 6 (W6)). PBMC samples were pooled together into 37 pools, loading 2 or three samples per lane in the 10X Genomics chip, in equal proportions, according to a pre-designed pooling matrix.
We report a significant advance in Moyamoya disease (MMD) research through our unique access to a large MMD patient population combined with high depth exome sequencing and bioinformatics. We confirm a reported RNF213 founder mutation (FM), provide evidence for its ethnic specificity and identify novel variants in other genes that associate with Caucasian and non-RNF213 FM MMD. This work has a broader impact in vascular research by highlighting novel genetic aspects of cerebrovascular disease. Moyamoya Disease (MMD) is a rare cerebrovascular disorder characterized by unilateral or bilateral progressive stenosis or occlusion of the internal carotid artery, with frequent involvement of the anterior cerebral artery and the middle cerebral artery (MCA). Fragile collateral vessels subsequently develop and are particularly prone to hemorrhage. MMD patients are usually diagnosed angiographically following presentation of developmental delays, seizures, migraines, ischemia, and/or hemorrhage. The most common approach to re-establish normal blood flow is bypass surgery. While some progress has been made in identifying disease-associated genes, the underlying disease biology is not well understood. Moyamoya disease (MMD) is a rare cerebrovascular disease. Our unique access to a large MMD patient population combined with high depth exome sequencing and bioinformatics has led to a significant advance in the field. We confirm a reported RNF213 founder mutation (FM), provide evidence for its ethnic specificity and identify novel variants in other genes that associate with Caucasian and non-RNF213 FM MMD. This work has a broader impact in vascular research by highlighting novel genetic aspects of cerebrovascular disease. The 125 ethnically diverse, unrelated MMD patients were matched based on sex and broad ethnic category to 125 controls. Control DNA from the 1000 Genomes project was used to design a control Library by Personalis. 125 controls were selected from this Personalis Control DNA Library for our study. (The 1000 Genomes Project Consortium 2012). Genomic DNA was extracted using the Gentra Puregene kit (Qiagen, Valencia, CA). Libraries were prepared from approximately 3 µg of high quality genomic DNA (50-200 ng/µl) using Illumina TruSeq Genomic DNA High throughput Sample Prep Kits (Illumina, San Diego, CA) and exome enrichment (targeting 62Mb) was accomplished using the TruSeq Exome Target Enrichment kit (Illumina, San Diego, CA), all according to manufacturer's protocols. Target enrichment validation was confirmed by determining the concentration of the library by PicoGreen-based quantitation. Library yields ranged from 100-1000ng of DNA, a portion of which was run on the Bioanalyzer HS DNA chip (Agilent, Santa Clara, CA), with an average size of 300-550nt for DNA fragments. Exome Sequencing. Sequencing was performed using Illumina Hiseq2000 or HiSeq2500 sequencers with single lane, paired-end 2X100bp reads. DNA fragments were generated and amplified using Clonal Single Molecule Array technology (Illumina, San Diego, CA). The sequences were determined using the Clonal Single Molecule Array and Sequencing-by-Synthesis using Illumina's instrumentation and Reversible Terminator Chemistry. Each sequencing lane interrogated the DNA sequences of a pool of 3 individual sample libraries each carrying a unique index. Sequencing reads of at least 2x100bp in length for a total of approximately 8 Gb of sequence data were generated for each sample.
The Federated EGA is a global resource for discovery of and access to sensitive human omics and associated data consented for secondary use, through a network of human data repositories to accelerate biomedical research and improve human health. The Federated EGA network was launched in September 2022 with five inaugural nodes, and since 2023 seven operational nodes can share data across national borders in adherence to European and national laws. A few weeks after FEGA's official launch, in November 2022, the European Genomic Data Infrastructure (GDI) project was kicked-off . This European Commission co-funded project, coordinated by ELIXIR, is aimed to deliver federated, sustainable and secure data infrastructure to access genomic and related phenotypic and clinical data across Europe. This project supports the aim of the 1+MG initiative (25 EU countries, Norway and UK) to enable personalised medicine and health through a shared framework and infrastructure for securely accessing and integrating high quality genomic data and other health data across borders.1+MG will be an integral component of the European Health Data Space (EHDS) for secondary use (Healthdata@EU) as an authorised participant. How are Federated EGA and GDI similar? Given their shared visions, the Federated EGA and European GDI networks have a lot in common. To begin, they share the same overall goal of establishing networks of “nodes” that host sensitive human data within a jurisdiction and connecting these nodes in a global network to support data discovery and promote human genome data access for research and healthcare. FEGA and GDI are both initially focused on enabling access to human genomic data, while FEGA ambition is to expand the scope to clinical research data and other omics data in the future. FEGA and GDI are both built on open and interoperable software solutions, a subset of which are based on the LocalEGA components. FEGA and GDI implementation solutions are based on international community standards, for example those developed by the Global Alliance for Genomics and Health, which contributes to making them interoperable. Both FEGA and GDI allow for the nodes to make use of any solution that is fully compatible. As illustrated in Figure 1, a final key point is the significant overlap of institutions involved in envisioning and operating FEGA and GDI nodes, with the clear wish to keep them interoperable in the future. What distinguishes Federated EGA from GDI? Despite being largely similar, there are some differences between the FEGA and GDI networks, which we aim to clarify in this post. The first difference is the governance model that the nodes will operate to comply with national laws and GDPR. In the FEGA network, nodes have taken inspiration from the EGA data access model, where the infrastructure is a data processor of the hosted data and data controllership remains with the originating Data Access Committees (i.e. Data controllers) for each dataset. On the other hand, in the GDI network the controllership of datasets will be transferred to a 1+MG European Digital Infrastructure Consortium (EDIC) legal entity created by the Member States (MS), who will make data access decisions, with data holders having veto powers for their datasets. Importantly, FEGA nodes have the flexibility to choose another model to fit with their data protection framework, including becoming data controllers for their datasets. The second difference is the inclusion criteria for data. While FEGA nodes are designed to accept almost any type of omics data in need of control access (e.g. genomics, transcriptomics, genotyping, single cells sequencing, patient-tracked metagenomics), GDI nodes are initially, but not only, focused on accepting whole genome and exome sequencing data and affiliated data from sources such as (i) the Genome of Europe use case of the 1+MG initiative which specifically aims to fulfil the mission of building “a European network of national genomic reference cohorts of at least 500,000 citizens”) (ii) data collection of other types of genomic data identified by countries through the 1+MG dashboard and (iii) genomic data coming from data holders which would need to fulfil EHDS requirements. The third difference is the maturity of the software stacks. The FEGA network provides a set of software for data and metadata submission, storage, permissions management, and file distribution (the LocalEGA package). GDI is building a complete set of open source reference for the five functionalities covering the full data life cycle (a few more compared to the LocalEGA) including federated processing (analytics, AI/ML), which is still under active development. Notably, one of the GDI functionalities - storage and interfaces - can be satisfied by using the LocalEGA storage solution, highlighting the ability of FEGA and GDI to be interoperable. In both FEGA and GDI, nodes are allowed to use alternative solutions as long as they are fully interoperable with the networks. Figure 1 provides a simple overview of these described commonalities and differences. Figure1: schematic overview of the main commonalities and difference among a FEGA and a GDI node Can the same institute run a FEGA and a GDI node at the same time? The answer is Yes! We believe this is entirely possible and encouraged. As long as 1+MG requirements are continuously fulfilled in the implementation. The same trained personnel could operate the infrastructure and leverage the same national funding to run a FEGA and a GDI node. Several European nodes, especially GDI vanguard nodes like Norway, Sweden, Finland and Spain are following this model. The idea is that a lot of the work can be reused, given the overlapping scope and the interoperable technology. Thus, the hosted datasets can be discovered via the two catalogues, perhaps accessible under the same or different governance models. From a FEGA node perspective, the GDI datasets could simply be a subset of the datasets hosted in the FEGA node, which have their specific governance as any other. What's next? So, the amazing teams of people building the Federated EGA network and the European GDI network are working together to create infrastructures able to provide secure access to human genomic and associated data around Europe. Because we all know how much human genomics can improve healthcare and precision medicine. And we all want to collaborate to make it happen. This article has been reviewed collectively by members of the EGA and the GDI coordination team.