Need Help?

The Human Phenotype Project (HPP) is a large-scale deep-phenotype prospective longitudinal and ethnically diverse cohort

The Human Phenotype Project (HPP) is a large-scale deep-phenotype prospective longitudinal and ethnically diverse cohort and biobank that we established. To date, approximately 28,000 participants have enrolled in the study, with over 13,000 having completed their initial visit. The project is aimed at identifying novel molecular signatures with diagnostic, prognostic and therapeutic value, and at developing AI-based predictive models for disease onset and progression. A unique feature of the HPP is its deep and longitudinal profiling that includes medical history, lifestyle and nutritional habits, vital signs, anthropometrics, blood tests, continuous glucose and sleep monitoring, imaging modalities and molecular profiling of the transcriptome, genetics, gut, vaginal and oral microbiome, metabolome and immune system. For several of these modalities, the HPP is the first or largest cohort of its kind. Our analyses of this data provide novel insights into the variation of phenotypes with age and ethnicity, highlighting the importance of having ethnically diverse cohorts and the need for establishing age-dependent norms. We demonstrate how the HPP can be used to unravel personalized molecular signatures of disease by comparison to patient-specific and disease-specific matched healthy controls. Leveraging the extensive dietary and lifestyle HPP data, we systematically identify associations between lifestyle factors and health outcomes. Finally, we present a multi-modal foundation AI model that is trained using self-supervised learning on diet and CGM data and outperforms existing methods in predicting future onset of disease. We extend this AI framework to integrate all data modalities of each subject as a continuous sequence of diverse medical events, creating a digital twin that can simulate interventions and predict health trajectories and outcomes.

Click on a Dataset ID in the table below to learn more, and to find out who to contact about access to these data

Dataset ID Description Technology Samples
EGAD00010002714 Gencove 8958