Emirati Genome Project subset of 43,608 WGS samples for population-scale variant discovery and allele frequency mapping.
Here, we present a comprehensive genomic characterization of a cohort of 43,608 Emirati genomes sequenced as part of the Emirati Genome Program (EGP). This study identified more than 421 million single nucleotide variants and indels and more than 600 million copy number- and structural variants. Small variants had 756 million molecular effects annotated. Of the identified variants, 69.04% were observed with more than one allele. Of 7.7 million polymorphic variants having an allele frequency (AF) of more than 5% in EGP, 1,348 have a predicted deleterious effect on a protein. Characterization of global variation reveals that EGP represents a genetic continuum encompassing the range of African, Asian, and European populations. It is best described by two Arabian, an Eurasian, and an African component, with the predominant Arabian component linked to mitochondrial haplogroups J and T, which are commonly attributed to the Middle East. Various aneuploidies of sex chromosomes were detected in 93 individuals overall and of chromosome 21 in 41 individuals. The median inbreeding coefficient and cumulative runs of homozygosity (ROHs) lengths were increased due to extensive consanguinity, being largest in the groups with Arabian main ancestry components and higher than reported for Qatar. Families were identified based on genetic relatedness and classified into 264 families with unrelated parents and 247 families with third- and fourth-degree consanguineous parents. Representative consanguineous pedigrees of families in EGP were outlined. Cumulative ROHs were affected by the main ancestry component and significantly increased in offspring of consanguineous parents, with a pronounced difference between 3rd and 4th-degree relatedness. Investigation of cumulative AFs of variants causing Mendelian diseases highlighted genes related to alpha- and beta-thalassemia (HBB, HBA2). It showed a high burden of variants causing severe recessive diseases, metabolic and retinal disorders, and hearing loss. In summary, EGP represents a landmark effort in characterizing the genetic diversity of the Emirati population, leveraging the largest Middle Eastern cohort reported to date.
- Type: Population Genomics
- Archiver: European Genome-Phenome Archive (EGA)
Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data
| Dataset ID | Description | Technology | Samples |
|---|---|---|---|
| EGAD50000001558 | Illumina NovaSeq 6000 | 24 |
