Need Help?

Computational approach to discriminate human and mouse sequences in patient-derived tumour xenografts

Whole Exome Sequencing was performed in a dilution series containing known amounts of human and mouse DNA, 3x 100% human 0% mouse, 2x 90/10, 3x 50/50, 2x 25/75 and 3x 0/100. A set of breast cancer clinical samples, matched normal tissue and matched PDTXs (total number = 14) were also analysed. Paired-end 75bp sequences for the dilution series and paired-end 125bp for the clinical samples were obtained on Illumina HiSeq2500; fastq files are provided. A triplicate analysis of the transcriptome using RNA-seq was also performed for the Universal Human RNA Reference and the Universal Mouse RNA Reference samples. Paired-end 150bp fastq files obtained on Illumina HiSeq4000 are provided.

Request Access

Policy regulating access to EGAS00001002445

DATA ACCESS AGREEMENT These terms and conditions govern access to the managed access datasets (details of which are set out in Appendix I) to which the User Institution has requested access. The User Institution agrees to be bound by these terms and conditions. Definitions Authorised Personnel: The individuals at the User Institution to whom Caldas Lab grants access to the Data. This includes the User, the individuals listed in Appendix II and any other individuals for whom the User Institution subsequently requests access to the Data. Details of the initial Authorised Personnel are set out in Appendix II. Data: The managed access datasets to which the User Institution has requested access. Data Producers: Caldas Lab and the collaborators listed in Appendix I responsible for the development, organisation, and oversight of these Data. External Collaborator: A collaborator of the User, working for an institution other than the User Institution. Project: The project for which the User Institution has requested access to these Data. A description of the Project is set out in Appendix II. Publications: Includes, without limitation, articles published in print journals, electronic journals, reviews, books, posters and other written and verbal presentations of research. Research Participant: An individual whose data form part of these Data. Research Purposes: Shall mean research that is seeking to advance the understanding of genetics and genomics, including the treatment of disorders, and work on statistical methods that may be applied to such research. User: The principal investigator for the Project. User Institution(s): The Institution that has requested access to the Data. Caldas Lab: Prof. Carlos Caldas Breast cancer functional genomics laboratory, Cancer Research UK Cambridge Institute, University of Cambridge 1. The User Institution agrees to only use these Data for the purpose of the Project (described in Appendix II) and only for Research Purposes. The User Institution further agrees that it will only use these Data for Research Purposes which are within the limitations (if any) set out in Appendix I. 2. The User Institution agrees to preserve, at all times, the confidentiality of these Data. In particular, it undertakes not to use, or attempt to use these Data to compromise or otherwise infringe the confidentiality of information on Research Participants. Without prejudice to the generality of the foregoing, the User Institution agrees to use at least the measures set out in Appendix I to protect these Data. 3. The User Institution agrees to protect the confidentiality of Research Participants in any research papers or publications that they prepare by taking all reasonable care to limit the possibility of identification. 4. The User Institution agrees not to link or combine these Data to other information or archived data available in a way that could re-identify the Research Participants, even if access to that data has been formally granted to the User Institution or is freely available without restriction. 5. The User Institution agrees only to transfer or disclose these Data, in whole or part, or any material derived from these Data, to the Authorised Personnel. Should the User Institution wish to share these Data with an External Collaborator, the External Collaborator must complete a separate application for access to these Data. 6. The User Institution agrees that the Data Producers, and all other parties involved in the creation, funding or protection of these Data: a) make no warranty or representation, express or implied as to the accuracy, quality or comprehensiveness of these Data; b) exclude to the fullest extent permitted by law all liability for actions, claims, proceedings, demands, losses (including but not limited to loss of profit), costs, awards damages and payments made by the Recipient that may arise (whether directly or indirectly) in any way whatsoever from the Recipient’s use of these Data or from the unavailability of, or break in access to, these Data for whatever reason and; c) bear no responsibility for the further analysis or interpretation of these Data. 7. The User Institution agrees to follow the Fort Lauderdale Guidelines ( ) and the Toronto Statement ( This includes but is not limited to recognising the contribution of the Data Producers and including a proper acknowledgement in all reports or publications resulting from the use of these Data. 8. The User Institution agrees to follow the Publication Policy in Appendix III. This includes respecting the moratorium period for the Data Producers to publish the first peer-reviewed report describing and analysing these Data. 9. The User Institution agrees not to make intellectual property claims on these Data and not to use intellectual property protection in ways that would prevent or block access to, or use of, any element of these Data, or conclusion drawn directly from these Data. 10. The User Institution can elect to perform further research that would add intellectual and resource capital to these data and decide to obtain intellectual property rights on these downstream discoveries. In this case, the User Institution agrees to implement licensing policies that will not obstruct further research and to follow the U.S. National Institutes of Health Best Practices for the Licensing of Genomic Inventions (2005) ( ) in conformity with the Organisation for Economic Co-operation and Development Guidelines for the Licensing of the Genetic Inventions (2006) ( ). 11. The User Institution agrees to destroy/discard the Data held, once it is no longer used for the Project, unless obliged to retain the data for archival purposes in conformity with audit or legal requirements. 12. The User Institution will notify Caldas Lab within 30 days of any changes or departures of Authorised Personnel. 13. The User Institution will notify Caldas Lab prior to any significant changes to the protocol for the Project. 14. The User Institution will notify Caldas Lab as soon as it becomes aware of a breach of the terms or conditions of this agreement. 15. Caldas Lab may terminate this agreement by written notice to the User Institution. If this agreement terminates for any reason, the User Institution will be required to destroy any Data held, including copies and backup copies. This clause does not prevent the User Institution from retaining these data for archival purpose in conformity with audit or legal requirements. 16. The User Institution accepts that it may be necessary for the Data Producers to alter the terms of this agreement from time to time. As an example, this may include specific provisions relating to the Data required by Data Producers other than Caldas Lab. In the event that changes are required, the Data Producers or their appointed agent will contact the User Institution to inform it of the changes and the User Institution may elect to accept the changes or terminate the agreement. 17. If requested, the User Institution will allow data security and management documentation to be inspected to verify that it is complying with the terms of this agreement. 18. The User Institution agrees to distribute a copy of these terms to the Authorised Personnel. The User Institution will procure that the Authorised Personnel comply with the terms of this agreement. 19. This agreement (and any dispute, controversy, proceedings or claim of whatever nature arising out of this agreement or its formation) shall be construed, interpreted and governed by the laws of England and Wales and shall be subject to the exclusive jurisdiction of the English courts. Agreed for User Institution Signature: Name: Title: Date: Principal Investigator I confirm that I have read and understood this Agreement. Signature: Name: Title: Date: Agreed for Caldas Lab Signature: Name: Title: Date: APPENDIX I – DATASET DETAILS APPENDIX II ––PROJECT DETAILS APPENDIX III –– PUBLICATION POLICY APPENDIX I – DATASET DETAILS Dataset reference (EGA Study ID and Dataset Details) EGAS00001002445 Name of project that created the dataset Computational approach to discriminate human and mouse sequences in patient-derived tumour xenografts Names of other data producers/collaborators none Specific limitations on areas of research none Minimum protection measures required File access: Data can be held in unencrypted files on an institutional compute system, with Unix user group read/write access for one or more appropriate groups but not Unix world read/write access behind a secure firewall. Laptops holding these data should have password protected logins and screenlocks (set to lock after 5 min of inactivity). If held on USB keys or other portable hard drives, the data must be encrypted. APPENDIX II – PROJECT DETAILS (to be completed by the Requestor) Details of dataset requested i.e., EGA Study and Dataset Accession Number Brief abstract of the Project in which the Data will be used (500 words max) All Individuals who the User Institution to be named as registered users Name of Registered User Email Job Title Supervisor* All Individuals that should have an account created at the EGA Name of Registered User Email Job Title APPENDIX III – PUBLICATION POLICY Caldas Lab intend to publish the results of their analysis of this dataset and do not consider its deposition into public databases to be the equivalent of such publications. Caldas Lab anticipate that the dataset could be useful to other qualified researchers for a variety of purposes. However, some areas of work are subject to a publication moratorium. The publication moratorium covers any publications (including oral communications) that describe the use of the dataset. For research papers, submission for publication should not occur until 6 months after these data were first made available on the relevant hosting database, unless Caldas Lab has provided written consent to earlier submission. In any publications based on these data, please describe how the data can be accessed, including the name of the hosting database (e.g., The European Genome-phenome Archive at the European Bioinformatics Institute) and its accession numbers (e.g., EGAS00000000029), and acknowledge its use in a form agreed by the User Institution with Caldas Lab.

Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.

Study ID Study Title Study Type
EGAS00001002445 Other
ID File Type Size Located in
EGAF00001747947 fq.gz 65.9 MB
EGAF00001747948 fq.gz 67.1 MB
EGAF00001747949 fq.gz 47.7 MB
EGAF00001747950 fq.gz 40.0 MB
EGAF00001747981 fq.gz 71.4 MB
EGAF00001747982 fq.gz 72.7 MB
EGAF00001747983 fq.gz 575.7 MB
EGAF00001747984 fq.gz 579.0 MB
EGAF00001747985 fq.gz 59.1 MB
EGAF00001747986 fq.gz 60.0 MB
EGAF00001747987 fq.gz 573.0 MB
EGAF00001747988 fq.gz 575.2 MB
EGAF00001747989 fq.gz 577.4 MB
EGAF00001747990 fq.gz 579.0 MB
EGAF00001747991 fq.gz 571.8 MB
EGAF00001747992 fq.gz 573.9 MB
EGAF00001747993 fq.gz 69.4 MB
EGAF00001747994 fq.gz 70.7 MB
EGAF00001747995 fq.gz 571.2 MB
EGAF00001747996 fq.gz 572.8 MB
EGAF00001747997 fq.gz 570.5 MB
EGAF00001747998 fq.gz 572.3 MB
EGAF00001747999 fq.gz 515.7 MB
EGAF00001748000 fq.gz 513.0 MB
EGAF00001748001 fq.gz 544.2 MB
EGAF00001748002 fq.gz 547.7 MB
EGAF00001748003 fq.gz 68.6 MB
EGAF00001748004 fq.gz 57.1 MB
EGAF00001748005 fq.gz 109.3 MB
EGAF00001748006 fq.gz 110.1 MB
EGAF00001748007 fq.gz 99.8 MB
EGAF00001748008 fq.gz 100.8 MB
EGAF00001748009 fq.gz 98.1 MB
EGAF00001748010 fq.gz 99.1 MB
EGAF00001748041 fq.gz 619.2 MB
EGAF00001748042 fq.gz 610.7 MB
EGAF00001748043 fq.gz 92.5 MB
EGAF00001748044 fq.gz 93.4 MB
EGAF00001748045 fq.gz 82.5 MB
EGAF00001748046 fq.gz 86.2 MB
EGAF00001748047 fq.gz 609.3 MB
EGAF00001748048 fq.gz 601.2 MB
EGAF00001748049 fq.gz 430.8 MB
EGAF00001748050 fq.gz 419.3 MB
EGAF00001748051 fq.gz 74.4 MB
EGAF00001748052 fq.gz 75.5 MB
EGAF00001748053 fq.gz 69.5 MB
EGAF00001748054 fq.gz 57.4 MB
EGAF00001748055 fq.gz 586.1 MB
EGAF00001748056 fq.gz 578.7 MB
EGAF00001748057 fq.gz 612.3 MB
EGAF00001748058 fq.gz 603.5 MB
EGAF00001748059 fq.gz 81.8 MB
EGAF00001748060 fq.gz 82.5 MB
EGAF00001748061 fq.gz 609.0 MB
EGAF00001748062 fq.gz 599.8 MB
EGAF00001748063 fq.gz 616.9 MB
EGAF00001748064 fq.gz 606.7 MB
EGAF00001748065 fq.gz 611.7 MB
EGAF00001748066 fq.gz 603.1 MB
EGAF00001748067 fq.gz 50.4 MB
EGAF00001748068 fq.gz 52.0 MB
EGAF00001748069 fq.gz 49.0 MB
EGAF00001748070 fq.gz 50.5 MB
EGAF00001748101 fq.gz 54.7 MB
EGAF00001748102 fq.gz 56.3 MB
EGAF00001748103 fq.gz 35.5 MB
EGAF00001748104 fq.gz 29.3 MB
EGAF00001748105 fq.gz 598.9 MB
EGAF00001748106 fq.gz 584.4 MB
EGAF00001748107 fq.gz 35.3 MB
EGAF00001748108 fq.gz 29.4 MB
EGAF00001748109 fq.gz 338.2 MB
EGAF00001748110 fq.gz 324.4 MB
EGAF00001748111 fq.gz 599.0 MB
EGAF00001748112 fq.gz 585.9 MB
EGAF00001748113 fq.gz 46.1 MB
EGAF00001748114 fq.gz 47.5 MB
EGAF00001748115 fq.gz 37.9 MB
EGAF00001748116 fq.gz 39.1 MB
EGAF00001748117 fq.gz 602.4 MB
EGAF00001748118 fq.gz 588.5 MB
EGAF00001748119 fq.gz 570.4 MB
EGAF00001748120 fq.gz 559.1 MB
EGAF00001748121 fq.gz 599.4 MB
EGAF00001748122 fq.gz 586.5 MB
EGAF00001748123 fq.gz 597.7 MB
EGAF00001748124 fq.gz 581.1 MB
EGAF00001748125 fq.gz 606.0 MB
EGAF00001748126 fq.gz 591.8 MB
EGAF00001748127 fq.gz 43.6 MB
EGAF00001748128 fq.gz 45.6 MB
EGAF00001748129 fq.gz 41.3 MB
EGAF00001748130 fq.gz 42.4 MB
EGAF00001748161 fq.gz 13.2 kB
EGAF00001748162 fq.gz 12.9 kB
EGAF00001748163 fq.gz 19.4 kB
EGAF00001748164 fq.gz 18.4 kB
EGAF00001748165 fq.gz 98.5 kB
EGAF00001748166 fq.gz 80.0 kB
EGAF00001748167 fq.gz 88.8 kB
EGAF00001748168 fq.gz 72.0 kB
EGAF00001748169 fq.gz 16.1 kB
EGAF00001748170 fq.gz 15.8 kB
EGAF00001748171 fq.gz 593.0 MB
EGAF00001748172 fq.gz 602.7 MB
EGAF00001748173 fq.gz 596.0 MB
EGAF00001748174 fq.gz 605.8 MB
EGAF00001748175 fq.gz 17.5 kB
EGAF00001748176 fq.gz 17.0 kB
EGAF00001748177 fq.gz 22.8 kB
EGAF00001748178 fq.gz 21.6 kB
EGAF00001748179 fq.gz 566.4 MB
EGAF00001748180 fq.gz 575.7 MB
EGAF00001748181 fq.gz 590.7 MB
EGAF00001748182 fq.gz 597.7 MB
EGAF00001748183 fq.gz 25.5 kB
EGAF00001748184 fq.gz 24.3 kB
EGAF00001748185 fq.gz 527.5 MB
EGAF00001748186 fq.gz 533.6 MB
EGAF00001748187 fq.gz 36.9 kB
EGAF00001748188 fq.gz 35.6 kB
EGAF00001748189 fq.gz 599.3 MB
EGAF00001748190 fq.gz 609.9 MB
EGAF00001748221 fq.gz 593.9 MB
EGAF00001748222 fq.gz 604.4 MB
EGAF00001748223 fq.gz 593.6 MB
EGAF00001748224 fq.gz 603.7 MB
EGAF00001748225 fq.gz 96.2 MB
EGAF00001748226 fq.gz 98.8 MB
EGAF00001748227 fq.gz 103.3 MB
EGAF00001748228 fq.gz 105.9 MB
EGAF00001748229 fq.gz 93.9 MB
EGAF00001748230 fq.gz 96.4 MB
EGAF00001748231 fq.gz 69.0 MB
EGAF00001748232 fq.gz 56.9 MB
EGAF00001748233 fq.gz 72.9 MB
EGAF00001748234 fq.gz 75.0 MB
EGAF00001748235 fq.gz 328.5 MB
EGAF00001748236 fq.gz 323.7 MB
EGAF00001748237 fq.gz 88.7 MB
EGAF00001748238 fq.gz 91.1 MB
EGAF00001748239 fq.gz 384.7 MB
EGAF00001748240 fq.gz 384.0 MB
EGAF00001748241 fq.gz 380.5 MB
EGAF00001748242 fq.gz 380.4 MB
EGAF00001748243 fq.gz 382.4 MB
EGAF00001748244 fq.gz 381.5 MB
EGAF00001748245 fq.gz 379.9 MB
EGAF00001748246 fq.gz 377.5 MB
EGAF00001748247 fq.gz 79.5 MB
EGAF00001748248 fq.gz 81.5 MB
EGAF00001748249 fq.gz 87.7 MB
EGAF00001748250 fq.gz 91.1 MB
EGAF00001748281 fq.gz 380.3 MB
EGAF00001748282 fq.gz 379.9 MB
EGAF00001748283 fq.gz 380.2 MB
EGAF00001748284 fq.gz 379.4 MB
EGAF00001748285 fq.gz 362.0 MB
EGAF00001748286 fq.gz 362.2 MB
EGAF00001748287 fq.gz 69.8 MB
EGAF00001748288 fq.gz 57.1 MB
EGAF00001748289 fq.gz 1.0 GB
EGAF00001748290 fq.gz 1.1 GB
EGAF00001748465 fq.gz 888.5 MB
EGAF00001748466 fq.gz 1.0 GB
EGAF00001748467 fq.gz 896.3 MB
EGAF00001748468 fq.gz 982.5 MB
EGAF00001748469 fq.gz 932.9 MB
EGAF00001748470 fq.gz 1.0 GB
EGAF00001748471 fq.gz 648.8 MB
EGAF00001748472 fq.gz 743.2 MB
EGAF00001748473 fq.gz 622.9 MB
EGAF00001748474 fq.gz 697.0 MB
EGAF00001748475 fq.gz 639.1 MB
EGAF00001748476 fq.gz 716.1 MB
EGAF00001748477 fq.gz 893.4 MB
EGAF00001748478 fq.gz 1.1 GB
EGAF00001748479 fq.gz 952.1 MB
EGAF00001748480 fq.gz 1.1 GB
EGAF00001748481 fq.gz 994.5 MB
EGAF00001748482 fq.gz 1.1 GB
EGAF00001748483 fq.gz 726.5 MB
EGAF00001748484 fq.gz 870.4 MB
EGAF00001748485 fq.gz 730.2 MB
EGAF00001748486 fq.gz 826.5 MB
EGAF00001748487 fq.gz 747.1 MB
EGAF00001748488 fq.gz 846.4 MB
EGAF00001748489 fq.gz 737.6 MB
EGAF00001748490 fq.gz 853.1 MB
EGAF00001748491 fq.gz 828.4 MB
EGAF00001748492 fq.gz 889.0 MB
EGAF00001748493 fq.gz 868.5 MB
EGAF00001748494 fq.gz 935.1 MB
EGAF00001748495 fq.gz 652.9 MB
EGAF00001748496 fq.gz 781.6 MB
EGAF00001748497 fq.gz 636.1 MB
EGAF00001748498 fq.gz 713.8 MB
EGAF00001748499 fq.gz 652.2 MB
EGAF00001748500 fq.gz 732.5 MB
200 Files (79.1 GB)