Background Massively parallel sequencing technology has transformed cancer genomics. It is now feasible, in a clinically relevant time-frame, for a clinically manageable cost, to screen DNA from patient tumours for mutations essentially genome-wide. The challenge for personalised medicine will be to increase the sample size to thousands or tens of thousands of well-characterised cases in order to attain sufficient statistical power to stratify patients accurately across the complexity and genomic heterogeneity expected for most of the common tumour types. Currently, whole genome sequencing on this scale is not feasible, and targeted sequencing of relevant portions of the genome will be required. Pilot data We have developed protocols for large-scale, multiplexed sequencing of 100-200 genes in thousands of samples. Essentially, using robotic technology, genomic DNA from the cancer specimen is processed into sequencing libraries with unique DNA barcodes, thereby allowing sequencing reads to be attributed to the sample they derive from. Currently, these sequencing libraries can be generated in a 96-well format using fully automated protocols, and we are exploring methods to expand this to a 384-well format. The sequencing libraries are pooled and hybridized to custom sets of RNA baits representing the genomic regions of interest. Sequencing of the pulled-down libraries is done in pools of 48-96 samples per lane of an Illumina Hi-Seq. This protocol is already implemented at the Sanger Institute. We have published proof that somatic mutations in novel cancer genes can be identified from exome-wide sequencing. In unpublished pilot data, we have established the feasibility of robotic library production, custom pull-down, and multiplexed sequencing of barcoded libraries for 100 known myeloid cancer genes across 760 myelodysplasia samples. Highlights of the data thus far analysed reveal that the coverage is remarkably even between samples; when 96 samples are run, average coverage per lane of sequencing is ~250, with 90-95% of targeted exons covered by >25 reads; known mutations can be discovered in the data set; and the protocol is amenable to whole genome amplified DNA. The bioinformatic algorithms for identification of substitutions and indels in pull-down data are well-established; we have pilot data proving that copy number changes, LOH and genomic rearrangements in specific regions of interest can also be identified by tiling of baits across the relevant loci. Proposal We propose to apply this methodology to 10000 samples from patients with AML enrolled in clinical trials over the last 10-20 years. Oncogenic point mutations and potentially genomic rearrangements will be identified, and linked to clinical outcome data, with a view to undertaking the following sorts of analyses: • Identification of co-occurrence, mutual exclusivity and clusters of driver mutations. • Correlation of prognosis with driver mutations and potentially gene-gene interactions • Exploration of genomic markers of drug response Ultimately, we would like to be in a position to release the mutation data together with matched clinical outcome data to genuine medical researchers via a controlled access approach, possibly within the COSMIC framework (www.sanger.ac.uk/genetics/CGP/cosmic/). The vision here is to generate a portal whereby a clinician faced with an AML patient and his / her mutational profile can obtain a ‘personalised’ prediction of outcome, together with a fair assessment of the uncertainty of the estimate. With a sufficient sample size, there would also be the potential to develop decision support algorithms for therapeutic choices based on such data.
Data Protection 1 About the EGA The European Genome-phenome Archive (EGA) was formally launched in 2008 at the European Bioinformatics Institute (EMBL-EBI), an outstation of the European Molecular Biology Laboratory (EMBL), to address an identified need for archiving and sharing the results of genome-wide association studies from the Wellcome Trust Case Control Consortium. In late 2012, with the signing of a memorandum of understanding (and subsequent formal agreement in 2016) between EMBL-EBI and the Centre for Genomic Regulation (CRG), the EGA formally became a joint project of the two institutes. The two institutes work together to support the EGA services, including supporting submissions, web site, strategic leadership, and data infrastructure developments. 2 EMBL-EBI & GDPR The EGA is co-managed by EMBL-EBI and CRG. EMBL-EBI is an international organisation established by treaty and has certain privileges and immunities (e.g. exemptions from the application of national law) and also may self-regulate its activities (e.g. establish its own institutional legal framework) within the framework of its founding act of 1973. The General Data Protection Regulation (GDPR) is a European Union (EU) regulation that legislates how organisations can share and process personal data of EU citizens. EMBL places great value in maintaining collaboration with researchers who are subject to GDPR. For that reason, it is of utmost importance for EMBL to handle data received from those collaborators in a secure and responsible manner. Mindful of its public mandate and the sensitivity of the data it handles, EMBL has always ensured a high level of data protection in its activities. Since the introduction of GDPR in May 2018, EMBL has established its internal policy on General Data Protection (IP68), exercising its right to self-regulate its operations,., IP 68 establishes a robust personal data protection framework that provides for data protection principles, enforceable data subject rights and oversight and redress mechanisms offering a level of protection comparable with GDPR. 3 CRG & GDPR The Centre for Genomic Regulation (CRG) is an international biomedical research institute of excellence, created in July 2000 and mainly participated by the Catalan Government. It is a non-profit foundation and its mission is to discover and advance knowledge for the benefit of society, public health and economic prosperity. The CRG is a CERCA center. CERCA is the collective organisation for all research centres of excellence in Catalonia. CERCA ensures these centres develop successfully by promoting synergies and strategic cooperation improving their visibility and the impact of their research and promoting the dialogue amongst both public and private stakeholders. As a legal entity based in Spain and operating within the EU, the CRG ensures the compliance with the GDPR and the legal regulations on personal data protection applicable at the national level, as well as any other legislation that may replace, modify or supplement the above-mentioned in terms of personal data protection. 4 EGA & GDPR EGA GDPR Schema 4.1 Genetic and phenotypic data Within GDPR, there are two main actors: data controllers and data processors. Data controllers are persons or entities which determine the purposes and means that the personal data may be processed, e.g. companies, researchers, or universities. For EGA, the data controller is ultimately the data producer and the submitter(s) who submit the data to EGA. The data controller also creates a Data Access Committee (DAC) who will decide on data access permissions at EGA. Data processors are the persons or entities which process the data on behalf of a data controller. With regard to GDPR, EGA is a data processor as it processes data as instructed by the data controller. GDPR applies to any organisation which accesses personal data from an individual within the EU. Under GDPR, personal data is defined as any data that is identifiable, including names and email addresses as well as health-related and genetic data. EGA does not accept personally identifiable data except genetic and phenotypic data, so all other data submitted to EGA, such as names and addresses, must be pseudonymised. GDPR requires that data controllers implement data protection principles, such as data minimisation, to minimise the risk of data leakage, and protect the rights of the data subjects. As a data processor, EGA has a set of security policies that are followed to minimise the risk of unauthorised data access or data loss. In its role as a data processor, EGA requires all submitters to sign a Data Processing Agreement (DPA) when the submission account is first created. This agreement is only required to be signed once per submitter, and will remain valid for future submissions to EGA. 4.2 Other personal data The EGA also collects personal data as part of our interactions with submitters, data access committees, and researchers accessing data distributed by EGA. The below privacy notices explain what personal data is collected by the specific service you are requesting, for what purposes, how it is processed, and how we keep it secure. Privacy Notices for EGA Title Version Last Updated EGA Data Access Committee Account Privacy Notice for EGA Data Access Committee Account 1.0 February 6, 2019 EGA User Account Privacy Notice for EGA User Account 1.0 February 6, 2019 EGA Helpdesk Service Privacy Notice for EGA Helpdesk Service 1.0 February 6, 2019 EGA Website Service Privacy Notice for EGA Website Service 1.0 February 6, 2019 Documentation Title Version Description EGA Security Overview Security Document 1.1 The EGA Security Document provides an overview of EGA’s practices in ensuring the security of data stored at EGA. EGA Data Processing Agreement Data Processing Agreement 1.5 The Data Processing Agreement must be completed and returned as part of the submission process. Please note that this document is non-negotiable. Authorised Submitters Authorised Submitters Formulary 1.0 The Authorised Submitters Form must be completed and returned as part of the submission process. Please list all those that should have access to the submission account in order to submit to the EGA should be detailed here. Dispute Resolution Any controversy or claim arising out of, or relating to, the DPA (including the enforceability or breach thereof, any question regarding its existence, validity or termination) or relating to the EGA Service shall be resolved using the internal dispute resolution mechanisms of EGA including those related to Data Protection. The EGA’s internal dispute resolution mechanism has the following procedure: EGA OPERATIONAL PHASE: Meetings between EGA staff and the Data Controller.LEGAL MANAGEMENT PHASE: Meetings between legal teams of EMBL, CRG and the Data Controller. DIRECTION MANAGEMENT PHASE: Negotiation between the legal representatives of EMBL, the CRG and the Data Controller. If the internal dispute resolution mechanism doesn’t resolve the controversy or claim the next phase is:ARBITRATION PHASE: Resolution by arbitration under the WIPO Expedited Arbitration Rules (“Rules”).