The underrepresentation of non-European individuals in human genetic studies so far
has limited the diversity of individuals in genomic datasets and led to reduced medical
relevance for a large proportion of the world’s population. Population-specific
reference genome datasets as well as genome-wide association studies in diverse
populations are needed to address this issue. Here we describe the pilot phase of the
GenomeAsia 100K Project. This includes a whole-genome sequencing reference
dataset from 1,739 individuals of 219 population groups and 64 countries across Asia.
We catalogue genetic variation, population structure, disease associations and
founder effects. We also explore the use of this dataset in imputation, to facilitate
genetic studies in populations across Asia and worldwide.