VESPA: Vanderbilt Electronic Systems for Pharmacogenomic Assessment
The Vanderbilt Electronic Systems for Pharmacogenomic Assessment (VESPA) Project is a large electronic medical record (EMR)- and biobank-based initiative for translational pharmacogenomics discoveries. Key research resources utilized in this effort include the BioVU DNA databank and associated Synthetic Derivative database of clinical information, and software tools developed to identify drugs and clinical events using EHR-derived structured and unstructured ("free text") data. Cohorts include subjects with primarily drug-response phenotypes, and most cases and controls identified include three data types: ICD-9 codes, medication regimens, and medical test results. Advanced informatics methods, such as natural language processing, were also used for some phenotypes. Algorithms used event-sequence analyses to establish temporal relationships between drugs and phenotypes. Both cases and control algorithms excluded records that contained specific comorbidities and were refined to achieve positive predictive value (PPV) > 90%. For automated algorithms failing to meet this threshold, manual review was coupled with algorithms to validate that the included cases were true positives.
A total of 11,639 subjects met phenotyping criteria for a least one of the 28 phenotypes investigated. Across all phenotypes cases and controls, 90% were reused as either a case or control for a least one other phenotype. Data are deposited for individual phenotypes in the VESPA substudies.
Substudies:
- Type: Cohort
- Archiver: The database of Genotypes and Phenotypes (dbGaP)