Need Help?


The Genomic Diversity in Africa Project (GDAP) started with the plan to develop a genomic resource from African populations, characterise genomic diversity and population history, and facilitate clinical studies in Africa. Currently, 25 individuals from 24 ethnolinguistic groups have been whole-genome sequenced at high depth totalling 585 individuals. An additional 41 individuals have been sequenced with 10X Genomics libraries. At this stage, the initial curation of this dataset has been finished and we are performing the analysis in coordination with our collaborators. The current state of the GDAP represents a very diverse panel of African populations that maximizes geographical and ethnic variation and represents a great starting point to achieve the aforementioned goals. However, southern sub-Saharan countries, Bantu speakers and hunter gatherer groups are currently underrepresented, despite being crucial to understand the evolutionary history of the continent. After extensive effort to collate studies documentation, we finally have the opportunity to sequence 600 new individuals from these groups, including countries as Gabon, Rwanda and Zambia, and address these deficiencies. We aim to proceed with the same strategy: to sequence at high depth 25 individuals with standard PCR free libraries, with 2 additional individuals with 10X Genomics Chromium libraries per ethnolinguistic group. The former allows a good representation of variants down to low frequency in any given population, and the latter allows accurate phasing and the analysis of structural variation. By including these new populations, we want to investigate three crucial questions in African history in addition to the initial objectives: the Bantu expansion, the evolutionary history of hunter gatherers and the transatlantic slave trade. Additionally, the expanded dataset will help us better discover the genetic variation present in Africa and characterize the African pangenome. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
EGAD00001006965 Illumina NovaSeq 6000 184