Need Help?

GDAP - Genome Diversity in Africa Project (2021-02-12)

The Genomic Diversity in Africa Project (GDAP) started with the plan to develop a genomic resource from African populations, characterise genomic diversity and population history, and facilitate clinical studies in Africa. Currently, 25 individuals from 24 ethnolinguistic groups have been whole-genome sequenced at high depth totalling 585 individuals. An additional 41 individuals have been sequenced with 10X Genomics libraries. At this stage, the initial curation of this dataset has been finished and we are performing the analysis in coordination with our collaborators. The current state of the GDAP represents a very diverse panel of African populations that maximizes geographical and ethnic variation and represents a great starting point to achieve the aforementioned goals. However, southern sub-Saharan countries, Bantu speakers and hunter gatherer groups are currently underrepresented, despite being crucial to understand the evolutionary history of the continent. After extensive effort to collate studies documentation, we finally have the opportunity to sequence 600 new individuals from these groups, including countries as Gabon, Rwanda and Zambia, and address these deficiencies. We aim to proceed with the same strategy: to sequence at high depth 25 individuals with standard PCR free libraries, with 2 additional individuals with 10X Genomics Chromium libraries per ethnolinguistic group. The former allows a good representation of variants down to low frequency in any given population, and the latter allows accurate phasing and the analysis of structural variation. By including these new populations, we want to investigate three crucial questions in African history in addition to the initial objectives: the Bantu expansion, the evolutionary history of hunter gatherers and the transatlantic slave trade. Additionally, the expanded dataset will help us better discover the genetic variation present in Africa and characterize the African pangenome. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/ . This dataset contains all the data available for this study on 2021-02-12.

Request Access

Institut Pasteur access policy

DATA ACCESS AGREEMENT These terms and conditions govern access to the managed access datasets (details of which are set out in Appendix I) to which the User Institution has requested access. The User Institution agrees to be bound by these terms and conditions. Definitions Authorised Personnel: The individuals at the User Institution to whom Institut Pasteur grants access to the Data. This includes the User, the individuals listed in Appendix II and any other individuals for whom the User Institution subsequently requests access to the Data. Details of the initial Authorised Personnel are set out in Appendix II. Data: The managed access datasets to which the User Institution has requested access. Data Producers: Institut Pasteur and the collaborators listed in Appendix I responsible for the development, organisation, and oversight of these Data. External Collaborator: A collaborator of the User, working for an institution other than the User Institution. Project: The project for which the User Institution has requested access to these Data. A description of the Project is set out in Appendix II. Publications: Includes, without limitation, articles published in print journals, electronic journals, reviews, books, posters and other written and verbal presentations of research. Research Participant: An individual whose data form part of these Data. Research Purposes: Shall mean research that is seeking to advance the understanding of genetics and genomics, including the treatment of disorders, and work on statistical methods that may be applied to such research. User: The principal investigator for the Project. User Institution(s): The Institution that has requested access to the Data. ------------------------- 1. The User Institution agrees to only use these Data for the purpose of the Project (described in Appendix II) and only for Research Purposes. The User Institution further agrees that it will only use these Data for Research Purposes which are within the limitations (if any) set out in Appendix I. 2. The User Institution agrees to preserve, at all times, the confidentiality of these Data. In particular, it undertakes not to use, or attempt to use these Data to compromise or otherwise infringe the confidentiality of information on Research Participants. Without prejudice to the generality of the foregoing, the User Institution agrees to use at least the measures set out in Appendix I to protect these Data. 3. The User Institution agrees to protect the confidentiality of Research Participants in any research papers or publications that they prepare by taking all reasonable care to limit the possibility of identification. 4. The User Institution agrees not to link or combine these Data to other information or archived data available in a way that could re-identify the Research Participants, even if access to that data has been formally granted to the User Institution or is freely available without restriction. 5. The User Institution agrees only to transfer or disclose these Data, in whole or part, or any material derived from these Data, to the Authorised Personnel. Should the User Institution wish to share these Data with an External Collaborator, the External Collaborator must complete a separate application for access to these Data. 6. The User Institution agrees that the Data Producers, and all other parties involved in the creation, funding or protection of these Data: a) make no warranty or representation, express or implied as to the accuracy, quality or comprehensiveness of these Data; b) exclude to the fullest extent permitted by law all liability for actions, claims, proceedings, demands, losses (including but not limited to loss of profit), costs, awards damages and payments made by the Recipient that may arise (whether directly or indirectly) in any way whatsoever from the Recipient’s use of these Data or from the unavailability of, or break in access to, these Data for whatever reason and; c) bear no responsibility for the further analysis or interpretation of these Data. 7. The User Institution agrees to follow the Fort Lauderdale Guidelines (http://www.wellcome.ac.uk/stellent/groups/corporatesite/@policy_communications/documents/web_document/wtd003207.pdf ) and the Toronto Statement (http://www.nature.com/nature/journal/v461/n7261/full/461168a.html). This includes but is not limited to recognising the contribution of the Data Producers and including a proper acknowledgement in all reports or publications resulting from the use of these Data. 8. The User Institution agrees to follow the Publication Policy in Appendix III. This includes respecting the moratorium period for the Data Producers to publish the first peer-reviewed report describing and analysing these Data. 9. The User Institution agrees not to make intellectual property claims on these Data and not to use intellectual property protection in ways that would prevent or block access to, or use of, any element of these Data, or conclusion drawn directly from these Data. 10. The User Institution can elect to perform further research that would add intellectual and resource capital to these data and decide to obtain intellectual property rights on these downstream discoveries. In this case, the User Institution agrees to implement licensing policies that will not obstruct further research and to follow the U.S. National Institutes of Health Best Practices for the Licensing of Genomic Inventions (2005) (https://www.icgc.org/files/daco/NIH_BestPracticesLicensingGenomicInventions_2005_en.pdf ) in conformity with the Organisation for Economic Co-operation and Development Guidelines for the Licensing of the Genetic Inventions (2006) (http://www.oecd.org/science/biotech/36198812.pdf ). 11. The User Institution agrees to destroy/discard the Data held, once it is no longer used for the Project, unless obliged to retain the data for archival purposes in conformity with audit or legal requirements. 12. The User Institution will notify Institut Pasteur within 30 days of any changes or departures of Authorised Personnel. 13. The User Institution will notify Institut Pasteur prior to any significant changes to the protocol for the Project. 14. The User Institution will notify Institut Pasteur as soon as it becomes aware of a breach of the terms or conditions of this agreement. 15. Institut Pasteur may terminate this agreement by written notice to the User Institution. If this agreement terminates for any reason, the User Institution will be required to destroy any Data held, including copies and backup copies. This clause does not prevent the User Institution from retaining these data for archival purpose in conformity with audit or legal requirements. 16. The User Institution accepts that it may be necessary for the Data Producers to alter the terms of this agreement from time to time. As an example, this may include specific provisions relating to the Data required by Data Producers other than Institut Pasteur. In the event that changes are required, the Data Producers or their appointed agent will contact the User Institution to inform it of the changes and the User Institution may elect to accept the changes or terminate the agreement. 17. If requested, the User Institution will allow data security and management documentation to be inspected to verify that it is complying with the terms of this agreement. 18. The User Institution agrees to distribute a copy of these terms to the Authorised Personnel. The User Institution will procure that the Authorised Personnel comply with the terms of this agreement. 19. This agreement (and any dispute, controversy, proceedings or claim of whatever nature arising out of this agreement or its formation) shall be construed, interpreted and governed by the laws of France and shall be subject to the exclusive jurisdiction of the French courts. ------------------------- Agreed for User Institution Signature: Name: Title: Date: Principal Investigator I confirm that I have read and understood this Agreement. Signature: Name: Title: Date: Agreed for Institut Pasteur Signature: Name: Nathalie de Parseval Title: Scientific General Secretary Date: ------------------------- APPENDIX I – DATASET DETAILS Dataset reference (EGA Study ID and Dataset Details) Whole exome sequencing data generated in this study have been deposited under accession code EGAS00001002457. Name of project that created the dataset The demographic history and mutational load of African hunter-gatherers and farmers Names of other data producers/collaborators Marie Lopez, Athanasios Kousathanas, Hélène Quach, Christine Harmant, Patrick Mouguiama-Daouda, Jean-Marie Hombert, Alain Froment, George H. Perry, Luis B. Barreiro, Paul Verdu, Etienne Patin, Lluís Quintana-Murci Specific limitations on areas of research The data reported in this project must be used only for research, academic purposes. Furthermore, the reported data must not be used for any purpose of re-identification of individual participants. Every data access request will be reviewed by the DAC of Institut Pasteur. Minimum protection measures required File access: Data can be held in unencrypted files on an institutional compute system, with Unix user group read/write access for one or more appropriate groups but not Unix world read/write access behind a secure firewall. Laptops holding these data should have password protected logins and screenlocks (set to lock after 5 min of inactivity). If held on USB keys or other portable hard drives, the data must be encrypted. -------------------------- APPENDIX II – PROJECT DETAILS (to be completed by the Requestor) Details of dataset requested i.e., EGA Study and Dataset Accession Number Brief abstract of the Project in which the Data will be used (500 words max) All Individuals who the User Institution to be named as registered users Name of Registered User Email Job Title Supervisor* All Individuals that should have an account created at the EGA Name of Registered User Email Job Title ----------------------- APPENDIX III – PUBLICATION POLICY In any publications based on these data, please describe how the data can be accessed, including the name of the hosting database (e.g., The European Genome-phenome Archive at the European Bioinformatics Institute) and its accession numbers (EGAS00001002457), and acknowledge its use in a form agreed by the User Institution with Institut Pasteur.

Studies are experimental investigations of a particular phenomenon, e.g., case-control studies on a particular trait or cancer research projects reporting matching cancer normal genomes from patients.

Study ID Study Title Study Type
EGAS00001003602 Whole Genome Sequencing

This table displays only public information pertaining to the files in the dataset. If you wish to access this dataset, please submit a request. If you already have access to these data files, please consult the download documentation.

ID File Type Size Located in
EGAF00004455116 cram 23.0 GB
EGAF00004455117 cram 22.3 GB
EGAF00004455118 cram 17.8 GB
EGAF00004455119 cram 22.3 GB
EGAF00004455120 cram 21.1 GB
EGAF00004455121 cram 22.2 GB
EGAF00004455122 cram 22.9 GB
EGAF00004455123 cram 23.4 GB
EGAF00004455124 cram 18.4 GB
EGAF00004455125 cram 19.9 GB
EGAF00004455126 cram 21.6 GB
EGAF00004455127 cram 20.4 GB
EGAF00004455128 cram 25.8 GB
EGAF00004455129 cram 20.8 GB
EGAF00004455130 cram 19.8 GB
EGAF00004455131 cram 19.6 GB
EGAF00004455132 cram 19.9 GB
EGAF00004455133 cram 21.0 GB
EGAF00004455134 cram 20.0 GB
EGAF00004455135 cram 29.9 GB
EGAF00004455136 cram 17.8 GB
EGAF00004455137 cram 17.2 GB
EGAF00004455138 cram 18.5 GB
EGAF00004455139 cram 18.3 GB
EGAF00004455140 cram 22.1 GB
EGAF00004455141 cram 18.9 GB
EGAF00004455142 cram 18.7 GB
EGAF00004455143 cram 18.9 GB
EGAF00004455144 cram 19.0 GB
EGAF00004455145 cram 25.4 GB
EGAF00004455146 cram 25.6 GB
EGAF00004455147 cram 25.3 GB
EGAF00004455148 cram 26.1 GB
EGAF00004455149 cram 24.2 GB
EGAF00004455150 cram 26.1 GB
EGAF00004455151 cram 24.0 GB
EGAF00004455152 cram 27.5 GB
EGAF00004455153 cram 25.8 GB
EGAF00004455154 cram 25.3 GB
EGAF00004455155 cram 25.2 GB
EGAF00004455156 cram 24.8 GB
EGAF00004455157 cram 27.8 GB
EGAF00004455158 cram 24.4 GB
EGAF00004455159 cram 24.0 GB
EGAF00004455160 cram 24.0 GB
EGAF00004455161 cram 23.6 GB
EGAF00004455162 cram 23.4 GB
EGAF00004455163 cram 23.9 GB
EGAF00004455164 cram 25.0 GB
EGAF00004455165 cram 26.4 GB
EGAF00004455166 cram 25.1 GB
EGAF00004455167 cram 21.7 GB
EGAF00004455168 cram 26.5 GB
EGAF00004455169 cram 27.3 GB
EGAF00004455170 cram 26.8 GB
EGAF00004455171 cram 27.7 GB
EGAF00004455172 cram 19.3 GB
EGAF00004455173 cram 27.5 GB
EGAF00004455174 cram 24.7 GB
EGAF00004455175 cram 23.8 GB
EGAF00004455176 cram 26.5 GB
EGAF00004455177 cram 26.1 GB
EGAF00004455178 cram 27.2 GB
EGAF00004455179 cram 26.4 GB
EGAF00004455180 cram 28.5 GB
EGAF00004455181 cram 21.8 GB
EGAF00004455182 cram 27.3 GB
EGAF00004455183 cram 25.6 GB
EGAF00004455184 cram 27.1 GB
EGAF00004455185 cram 26.1 GB
EGAF00004455186 cram 28.4 GB
EGAF00004455187 cram 27.9 GB
EGAF00004455188 cram 27.3 GB
EGAF00004455189 cram 26.2 GB
EGAF00004455190 cram 29.4 GB
EGAF00004455191 cram 24.0 GB
EGAF00004455192 cram 24.6 GB
EGAF00004455193 cram 23.2 GB
EGAF00004455194 cram 23.0 GB
EGAF00004455195 cram 25.6 GB
EGAF00004455196 cram 25.3 GB
EGAF00004455197 cram 25.2 GB
EGAF00004455198 cram 25.3 GB
EGAF00004455199 cram 24.6 GB
EGAF00004455200 cram 26.0 GB
EGAF00004455201 cram 20.9 GB
EGAF00004455202 cram 20.7 GB
EGAF00004455203 cram 26.2 GB
EGAF00004455204 cram 24.6 GB
EGAF00004455205 cram 19.1 GB
EGAF00004455206 cram 23.3 GB
EGAF00004455207 cram 24.4 GB
EGAF00004455208 cram 27.8 GB
EGAF00004455209 cram 24.0 GB
EGAF00004455210 cram 23.4 GB
EGAF00004455211 cram 21.5 GB
EGAF00004455212 cram 22.9 GB
EGAF00004455213 cram 23.0 GB
EGAF00004455214 cram 40.1 GB
EGAF00004455215 cram 27.3 GB
EGAF00004455216 cram 20.4 GB
EGAF00004455217 cram 19.6 GB
EGAF00004455218 cram 26.2 GB
EGAF00004455219 cram 33.7 GB
EGAF00004455220 cram 22.9 GB
EGAF00004455221 cram 23.1 GB
EGAF00004455222 cram 23.3 GB
EGAF00004455223 cram 21.9 GB
EGAF00004455224 cram 22.3 GB
EGAF00004455225 cram 20.5 GB
EGAF00004455226 cram 21.6 GB
EGAF00004455227 cram 21.8 GB
EGAF00004455228 cram 21.3 GB
EGAF00004455229 cram 22.1 GB
EGAF00004455230 cram 22.1 GB
EGAF00004455231 cram 24.3 GB
EGAF00004455232 cram 22.6 GB
EGAF00004455233 cram 21.3 GB
EGAF00004455234 cram 26.0 GB
EGAF00004455235 cram 27.2 GB
EGAF00004455236 cram 27.8 GB
EGAF00004455237 cram 29.5 GB
EGAF00004455238 cram 30.0 GB
EGAF00004455239 cram 28.3 GB
EGAF00004455240 cram 26.9 GB
EGAF00004455241 cram 26.0 GB
EGAF00004455242 cram 26.1 GB
EGAF00004455243 cram 29.3 GB
EGAF00004455244 cram 26.1 GB
EGAF00004455245 cram 26.8 GB
EGAF00004455246 cram 26.0 GB
EGAF00004455247 cram 26.1 GB
EGAF00004455248 cram 26.9 GB
EGAF00004455249 cram 26.8 GB
EGAF00004455250 cram 27.6 GB
EGAF00004455251 cram 27.1 GB
EGAF00004455252 cram 25.9 GB
EGAF00004455253 cram 28.2 GB
EGAF00004455254 cram 25.7 GB
EGAF00004455255 cram 25.3 GB
EGAF00004455256 cram 28.7 GB
EGAF00004455257 cram 27.2 GB
EGAF00004455258 cram 26.6 GB
EGAF00004455259 cram 26.6 GB
EGAF00004455260 cram 28.1 GB
EGAF00004455261 cram 25.8 GB
EGAF00004455262 cram 26.3 GB
EGAF00004455263 cram 26.0 GB
EGAF00004455264 cram 28.5 GB
EGAF00004455265 cram 26.0 GB
EGAF00004455266 cram 29.9 GB
EGAF00004455267 cram 27.3 GB
EGAF00004455268 cram 28.9 GB
EGAF00004455269 cram 29.7 GB
EGAF00004455270 cram 26.1 GB
EGAF00004455271 cram 32.1 GB
EGAF00004455272 cram 29.3 GB
EGAF00004455273 cram 27.7 GB
EGAF00004455274 cram 27.8 GB
EGAF00004455275 cram 23.5 GB
EGAF00004455276 cram 26.1 GB
EGAF00004455277 cram 24.3 GB
EGAF00004455278 cram 26.9 GB
EGAF00004455279 cram 25.5 GB
EGAF00004455280 cram 27.2 GB
EGAF00004455281 cram 25.6 GB
EGAF00004455282 cram 26.9 GB
EGAF00004455283 cram 28.0 GB
EGAF00004455284 cram 25.4 GB
EGAF00004455285 cram 27.5 GB
EGAF00004455286 cram 26.4 GB
EGAF00004455287 cram 26.4 GB
EGAF00004455288 cram 25.9 GB
EGAF00004455289 cram 27.8 GB
EGAF00004455290 cram 25.1 GB
EGAF00004455291 cram 26.9 GB
EGAF00004455292 cram 25.4 GB
EGAF00004455293 cram 26.8 GB
EGAF00004455294 cram 27.6 GB
EGAF00004455295 cram 28.3 GB
EGAF00004455296 cram 24.9 GB
EGAF00004455297 cram 27.7 GB
EGAF00004455298 cram 25.4 GB
EGAF00004673047 cram 23.8 GB
184 Files (4.6 TB)