Need Help?

A standardised framework for robust fragmentomic feature extraction from cell-free DNA sequencing data

Fragmentomic features of cell-free DNA represent promising non-invasive biomarkers for cancer diagnosis. However, a lack of systematic evaluation of biases in feature quantification has hindered the adoption of such applications. We compared features derived from whole-genome sequencing of ten healthy donors using nine library kits and ten data-processing routes, and validated them in 1,182 plasma samples from published studies. Our results clarify the variations resulting from library preparation and feature quantification methods. We designed the Trim Align Pipeline and the cfDNAPro R package as unified interfaces for data pre-processing, feature extraction, and visualisation, aiming to standardise multimodal feature engineering and integration for machine learning.

Request Access

Policy for access to data included in the following publication; A standardised framework for robust fragmentomic feature extraction from cell-free DNA sequencing data

DATA ACCESS AGREEMENT These terms and conditions govern access to the managed access datasets (details of which are set out in Appendix I) to which the User Institution has requested access. The User Institution agrees to be bound by these terms and conditions. Definitions Authorised Personnel: The individuals at the User Institution to whom Cambridge grants access to the Data. This includes the User, the individuals listed in Appendix II and any other individuals for whom the User Institution subsequently requests access to the Data. Details of the initial Authorised Personnel are set out in Appendix II. Data: The managed access datasets to which the User Institution has requested access. Data Provider: THE CHANCELLOR, MASTERS AND SCHOLARS OF THE UNIVERSITY OF CAMBRIDGE of The Old Schools, Trinity Lane, Cambridge CB2 1TN, United Kingdom (“Cambridge”); represented by staff at the Cancer Research UK Cambridge Institute, and contactable by email at Rosenfeld.LabAdmin@cruk.cam.ac.uk or future email as updated; or future representatives as indicated in Data Access Committee(s) for the dataset(s) defined in Appendix I. External Collaborator: A collaborator of the User, working for an institution other than the User Institution. Project: The project for which the User Institution has requested access to these Data. A description of the Project is set out in Appendix II. Publications: Includes, without limitation, articles published in print journals, electronic journals, reviews, books, posters and other written and verbal presentations of research. Research Participant: An individual whose data form part of these Data. Research Purposes: Shall mean research that is seeking to advance the understanding of genetics and genomics, including the treatment of disorders, and work on statistical methods that may be applied to such research. User: The principal investigator for the Project. User Institution(s): The Institution that has requested access to the Data. 1. Cambridge confirms that it is entitled to provide the Data to the User Institution for use as described in this Agreement. Nothing in this Agreement shall affect the ownership of the Data. 2. The User Institution agrees to only use these Data for the purpose of the Project (described in Appendix II) and only for Research Purposes. The User Institution further agrees that it will only use these Data for Research Purposes which are within the limitations (if any) set out in Appendix I. 3. The User Institution agrees to preserve, at all times, the confidentiality of these Data. In particular, it undertakes not to use, or attempt to use these Data to compromise or otherwise infringe the confidentiality of information on Research Participants. Without prejudice to the generality of the foregoing, the User Institution agrees to use at least the measures set out in Appendix I to protect these Data. 4. The User Institution agrees to protect the confidentiality of Research Participants in any research papers or publications that they prepare by taking all reasonable care to limit the possibility of identification. 5. The User Institution agrees not to link or combine these Data to other information or archived data available in a way that could re-identify the Research Participants, even if access to that data has been formally granted to the User Institution or is freely available without restriction. 6. The User Institution agrees only to transfer or disclose these Data, in whole or part, or any material derived from these Data, to the Authorised Personnel. Should the User Institution wish to share these Data with an External Collaborator, the External Collaborator must complete a separate application for access to these Data. 7. The User Institution agrees that the Data Provider, and all other parties involved in the creation, funding or protection of these Data: a) make no warranty or representation, express or implied as to the accuracy, quality or comprehensiveness of these Data; b) exclude to the fullest extent permitted by law all liability for actions, claims, proceedings, demands, losses (including but not limited to loss of profit), costs, awards damages and payments made by the User Institution that may arise (whether directly or indirectly) in any way whatsoever from the User’s use of these Data or from the unavailability of, or break in access to, these Data for whatever reason and; c) bear no responsibility for the further analysis or interpretation of these Data. 8. The User Institution agrees to follow the Fort Lauderdale Guidelines (http://www.wellcome.ac.uk/stellent/groups/corporatesite/@policy_communications/documents/web_document/wtd003207.pdf ) and the Toronto Statement (http://www.nature.com/nature/journal/v461/n7261/full/461168a.html). This includes but is not limited to recognising the contribution of the Data Provider and including a proper acknowledgement in all reports or publications resulting from the use of these Data. 9. The User Institution agrees to follow the Publication Policy in Appendix III. This includes respecting the moratorium period for the Data Provider to publish the first peer-reviewed report describing and analysing these Data. 10. The User Institution agrees not to make intellectual property claims on these Data and not to use intellectual property protection in ways that would prevent or block access to, or use of, any element of these Data, or conclusion drawn directly from these Data. 11. The User Institution shall promptly notify Cambridge of any application for patent or other proprietary rights to protect intellectual property contained in the Results. The User Institution shall not use the Results for any commercial purpose without the prior written consent of Cambridge. 12. The User Institution agrees to destroy/discard the Data held, once it is no longer used for the Project, unless obliged to retain the data for archival purposes in conformity with audit or legal requirements. 13. The User Institution will notify Cambridge within 30 days of any changes or departures of Authorised Personnel. 14. The User Institution will notify Cambridge prior to any significant changes to the protocol for the Project. 15. The User Institution will notify Cambridge as soon as it becomes aware of a breach of the terms or conditions of this agreement. 16. Cambridge may terminate this agreement by written notice to the User Institution. If this agreement terminates for any reason, the User Institution will be required to destroy any Data held, including copies and backup copies. This clause does not prevent the User Institution from retaining these data for archival purpose in conformity with audit or legal requirements. 17. The User Institution accepts that it may be necessary for the Data Provider to alter the terms of this agreement from time to time. As an example, this may include specific provisions relating to the Data required by Data Providers other than Cambridge. In the event that changes are required, the Data Provider or their appointed agent will contact the User Institution to inform it of the changes and the User Institution may elect to accept the changes or terminate the agreement. 18. If requested, the User Institution will allow data security and management documentation to be inspected to verify that it is complying with the terms of this agreement. 19. The User Institution agrees to distribute a copy of these terms to the Authorised Personnel. The User Institution will procure that the Authorised Personnel comply with the terms of this agreement. 20. This agreement (and any dispute, controversy, proceedings or claim of whatever nature arising out of this agreement or its formation) shall be construed, interpreted and governed by the laws of England and Wales and shall be subject to the exclusive jurisdiction of the English courts. Agreed for User Institution Signature: Name: Title: Date: Principal Investigator I confirm that I have read and understood this Agreement. Signature: Name: Title: Date: Agreed for Cambridge Signature: Name: Title: Date: APPENDIX I – DATASET DETAILS APPENDIX II ––PROJECT DETAILS APPENDIX III –– PUBLICATION POLICY APPENDIX I – DATASET DETAILS (to be completed by the Data Provider before passing to applicant) Dataset reference (EGA Study ID and Dataset Details) Study ID: EGAS00001008051 Conducted by the Rosenfeld Lab, this study focuses on fragmentomic feature extraction from cell-free DNA sequencing data. The dataset includes shallow whole-genome sequencing data, derived from multiple library preparation kits used on samples from healthy donors. Name of project that created the dataset Dataset created as part of the study titled “Advancing fragmentomic feature extraction from cell-free DNA sequencing data: opportunities and challenges”. Names of other Data Providers/collaborators Specific limitations on areas of research Minimum protection measures required File access: Data can be held in unencrypted files on an institutional compute system, with Unix user group read/write access for one or more appropriate groups but not Unix world read/write access behind a secure firewall. Laptops holding these data should have password protected logins and screenlocks (set to lock after 5 min of inactivity). If held on USB keys or other portable hard drives, the data must be encrypted. APPENDIX II – PROJECT DETAILS (to be completed by the Requestor) Details of dataset requested i.e., EGA Study and Dataset Accession Number Brief abstract of the Project in which the Data will be used (500 words max) All Individuals who the User Institution to be named as registered users Name of Registered User Email Job Title Supervisor* All Individuals that should have an account created at the EGA Name of Registered User Email Job Title APPENDIX III – PUBLICATION POLICY Cambridge intend to publish the results of their analysis of this dataset and do not consider its deposition into public databases to be the equivalent of such publications. Cambridge anticipate that the dataset could be useful to other qualified researchers for a variety of purposes. However, some areas of work are subject to a publication moratorium. The publication moratorium covers any publications (including oral communications) that describe the use of the dataset. For research papers, submission for publication should not occur until 3 months after these data were first made available on the relevant hosting database, unless Cambridge has provided written consent to earlier submission. In any publications based on these data, please describe how the data can be accessed, including the name of the hosting database (e.g., The European Genome-phenome Archive at the European Bioinformatics Institute) and its accession numbers (e.g., EGAS00000000029), and acknowledge its use in a form agreed by the User Institution with Cambridge.

Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.

Study ID Study Title Study Type
EGAS00001008051 Other

This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.

ID File Type Size Quality Report
Located in
EGAF00008714477 fq.gz 1.6 GB
EGAF00008714478 fq.gz 1.6 GB
EGAF00008714479 fq.gz 3.7 GB
EGAF00008714480 fq.gz 3.8 GB
EGAF00008714481 fq.gz 5.2 GB
EGAF00008714482 fq.gz 5.4 GB
EGAF00008714483 fq.gz 1.4 GB
EGAF00008714484 fq.gz 1.5 GB
EGAF00008714485 fq.gz 3.5 GB
EGAF00008714486 fq.gz 3.6 GB
EGAF00008714487 fq.gz 1.7 GB
EGAF00008714488 fq.gz 1.8 GB
EGAF00008714489 fq.gz 1.6 GB
EGAF00008714490 fq.gz 1.7 GB
EGAF00008714491 fq.gz 2.2 GB
EGAF00008714492 fq.gz 2.4 GB
EGAF00008714493 fq.gz 1.0 GB
EGAF00008714494 fq.gz 1.1 GB
EGAF00008714495 fq.gz 3.2 GB
EGAF00008714496 fq.gz 3.2 GB
EGAF00008714497 fq.gz 5.0 GB
EGAF00008714498 fq.gz 5.2 GB
EGAF00008714499 fq.gz 1.7 GB
EGAF00008714500 fq.gz 1.9 GB
EGAF00008714501 fq.gz 4.7 GB
EGAF00008714502 fq.gz 4.9 GB
EGAF00008714503 fq.gz 1.6 GB
EGAF00008714504 fq.gz 1.7 GB
EGAF00008714505 fq.gz 1.8 GB
EGAF00008714506 fq.gz 1.8 GB
EGAF00008714507 fq.gz 2.4 GB
EGAF00008714508 fq.gz 2.5 GB
EGAF00008714509 fq.gz 4.7 GB
EGAF00008714510 fq.gz 4.8 GB
EGAF00008714511 fq.gz 1.5 GB
EGAF00008714512 fq.gz 1.6 GB
EGAF00008714513 fq.gz 9.1 GB
EGAF00008714514 fq.gz 9.5 GB
EGAF00008714515 fq.gz 1.8 GB
EGAF00008714516 fq.gz 1.9 GB
EGAF00008714517 fq.gz 1.5 GB
EGAF00008714518 fq.gz 1.5 GB
EGAF00008714519 fq.gz 2.5 GB
EGAF00008714520 fq.gz 2.6 GB
EGAF00008714521 fq.gz 1.5 GB
EGAF00008714522 fq.gz 1.5 GB
EGAF00008714523 fq.gz 4.5 GB
EGAF00008714524 fq.gz 4.8 GB
EGAF00008714525 fq.gz 6.6 GB
EGAF00008714526 fq.gz 6.9 GB
EGAF00008714527 fq.gz 1.3 GB
EGAF00008714528 fq.gz 1.3 GB
EGAF00008714529 fq.gz 5.9 GB
EGAF00008714530 fq.gz 6.1 GB
EGAF00008714531 fq.gz 1.9 GB
EGAF00008714532 fq.gz 1.9 GB
EGAF00008714533 fq.gz 1.5 GB
EGAF00008714534 fq.gz 1.6 GB
EGAF00008714535 fq.gz 1.5 GB
EGAF00008714536 fq.gz 1.5 GB
EGAF00008714537 fq.gz 2.6 GB
EGAF00008714538 fq.gz 2.7 GB
EGAF00008714539 fq.gz 1.1 GB
EGAF00008714540 fq.gz 1.2 GB
EGAF00008714541 fq.gz 3.8 GB
EGAF00008714542 fq.gz 3.9 GB
EGAF00008714543 fq.gz 4.5 GB
EGAF00008714544 fq.gz 4.6 GB
EGAF00008714545 fq.gz 1.2 GB
EGAF00008714546 fq.gz 1.3 GB
EGAF00008714547 fq.gz 3.8 GB
EGAF00008714548 fq.gz 4.0 GB
EGAF00008714549 fq.gz 2.3 GB
EGAF00008714550 fq.gz 2.4 GB
EGAF00008714551 fq.gz 1.8 GB
EGAF00008714552 fq.gz 1.9 GB
EGAF00008714553 fq.gz 4.0 GB
EGAF00008714554 fq.gz 4.2 GB
EGAF00008714555 fq.gz 3.3 GB
EGAF00008714556 fq.gz 3.4 GB
EGAF00008714557 fq.gz 1.1 GB
EGAF00008714558 fq.gz 1.2 GB
EGAF00008714559 fq.gz 3.8 GB
EGAF00008714560 fq.gz 3.9 GB
EGAF00008714561 fq.gz 4.1 GB
EGAF00008714562 fq.gz 4.2 GB
EGAF00008714563 fq.gz 1.7 GB
EGAF00008714564 fq.gz 1.8 GB
EGAF00008714565 fq.gz 3.1 GB
EGAF00008714566 fq.gz 3.3 GB
EGAF00008714567 fq.gz 3.0 GB
EGAF00008714568 fq.gz 3.2 GB
EGAF00008714569 fq.gz 1.5 GB
EGAF00008714570 fq.gz 1.6 GB
EGAF00008714571 fq.gz 2.7 GB
EGAF00008714572 fq.gz 2.8 GB
EGAF00008714573 fq.gz 2.2 GB
EGAF00008714574 fq.gz 2.3 GB
EGAF00008714575 fq.gz 1.3 GB
EGAF00008714576 fq.gz 1.4 GB
EGAF00008714577 fq.gz 3.9 GB
EGAF00008714578 fq.gz 4.1 GB
EGAF00008714579 fq.gz 6.2 GB
EGAF00008714580 fq.gz 6.5 GB
EGAF00008714581 fq.gz 2.4 GB
EGAF00008714582 fq.gz 2.5 GB
EGAF00008714583 fq.gz 3.9 GB
EGAF00008714584 fq.gz 4.0 GB
EGAF00008714585 fq.gz 2.8 GB
EGAF00008714586 fq.gz 3.0 GB
EGAF00008714587 fq.gz 1.2 GB
EGAF00008714588 fq.gz 1.3 GB
EGAF00008714589 fq.gz 2.5 GB
EGAF00008714590 fq.gz 2.6 GB
EGAF00008714591 fq.gz 2.2 GB
EGAF00008714592 fq.gz 2.2 GB
EGAF00008714593 fq.gz 1.1 GB
EGAF00008714594 fq.gz 1.1 GB
EGAF00008714595 fq.gz 4.8 GB
EGAF00008714596 fq.gz 5.0 GB
EGAF00008714597 fq.gz 6.0 GB
EGAF00008714598 fq.gz 6.3 GB
EGAF00008714599 fq.gz 1.3 GB
EGAF00008714600 fq.gz 1.3 GB
EGAF00008714601 fq.gz 3.4 GB
EGAF00008714602 fq.gz 3.6 GB
EGAF00008714603 fq.gz 1.6 GB
EGAF00008714604 fq.gz 1.7 GB
EGAF00008714605 fq.gz 1.5 GB
EGAF00008714606 fq.gz 1.6 GB
EGAF00008714607 fq.gz 2.1 GB
EGAF00008714608 fq.gz 2.2 GB
EGAF00008714609 fq.gz 2.2 GB
EGAF00008714610 fq.gz 2.3 GB
EGAF00008714611 fq.gz 1.1 GB
EGAF00008714612 fq.gz 1.2 GB
EGAF00008714613 fq.gz 3.2 GB
EGAF00008714614 fq.gz 3.3 GB
EGAF00008714615 fq.gz 6.0 GB
EGAF00008714616 fq.gz 6.2 GB
EGAF00008714617 fq.gz 1.8 GB
EGAF00008714618 fq.gz 1.9 GB
EGAF00008714619 fq.gz 3.5 GB
EGAF00008714620 fq.gz 3.6 GB
EGAF00008714621 fq.gz 1.7 GB
EGAF00008714622 fq.gz 1.7 GB
EGAF00008714623 fq.gz 1.5 GB
EGAF00008714624 fq.gz 1.6 GB
EGAF00008714625 fq.gz 2.6 GB
EGAF00008714626 fq.gz 2.7 GB
EGAF00008714627 fq.gz 3.1 GB
EGAF00008714628 fq.gz 3.2 GB
EGAF00008714629 fq.gz 1.2 GB
EGAF00008714630 fq.gz 1.3 GB
EGAF00008714631 fq.gz 4.2 GB
EGAF00008714632 fq.gz 4.3 GB
EGAF00008714633 fq.gz 1.6 GB
EGAF00008714634 fq.gz 1.7 GB
EGAF00008714635 fq.gz 1.5 GB
EGAF00008714636 fq.gz 1.6 GB
EGAF00008714637 fq.gz 2.3 GB
EGAF00008714638 fq.gz 2.4 GB
EGAF00008714639 fq.gz 2.2 GB
EGAF00008714640 fq.gz 2.3 GB
164 Files (464.7 GB)