Need Help?

INCLUDE Data Hub: NDA GUIDs for Down Syndrome Research

The INCLUDE (INvestigation of Co-occurring conditions across the Lifespan to Understand Down syndromE) Project is an NIH-wide collaboration that seeks to improve health and quality of life for people with Down syndrome. The INCLUDE Project Data Coordinating Center (DCC) and partners created the INCLUDE Data Hub, a centralized data resource that allows access to large-scale clinical and multi-omics datasets specific to Down syndrome and supports collaborative, cloud-based analysis to accelerate scientific discoveries related to Down syndrome and its co-occurring conditions.

The INCLUDE Data Hub includes an innovative researcher data repository, a portal, and a cloud-based platform for data analysis. The portal provides access to de-identified demographics and clinical metadata, as well as multi-omics datasets, including whole genome sequences, transcriptomes, proteomes, and metabolomes. The INCLUDE Data Hub uses a 'registered access' model, whereby users can access the portal and associated data after agreeing to terms of use. From the Data Hub, users can also reach 'controlled access' data (e.g., whole genome sequences) only for those studies for which they have obtained approval by an NIH Data Access Committee through the corresponding dbGaP study. A thoughtful linkage strategy to identify duplicate participants across studies enables researchers to merge multiple datasets and data types to create richer datasets and reduce costly redundancies in large-scale data generation (e.g., -omics) while mitigating potential risks introduced by linkage.

Based on recommendations from the NICHD Office of Data Science and Sharing, which completed a comprehensive assessment of existing Privacy Protecting Record Linkage (PPRL) and associated data governance solutions, the INCLUDE DCC and the associated NIH Steering Committee have implemented NIMH Data Archive Global Unique Identifiers (NDA GUIDs) in the INCLUDE Data Hub. The NDA GUID Tool is a government-owned and operated PPRL technology that enables researchers to link data from multiple sources to the same individual without revealing personally identifiable information (PII). With the NDA GUID Tool, the same participant information will return the same GUID whenever or wherever it is entered. The GUID itself is not personally identifiable information or protected health information. Sharing data with NDA GUIDs allows approved researchers to link together data on a single participant, even if the data were collected at different locations or through different studies or are distributed from different data repositories.

This dbGaP study enables researchers to access an INCLUDE GUID Mapping File that includes all available NDA GUIDs and their associations to study-specific Participant IDs across studies in the INCLUDE Data Hub. This dbGaP study only distributes NDA GUIDs associated with the INCLUDE Data Hub datasets. To access the associated controlled-access datasets, add the corresponding dbGaP studies to your Data Access Request. A list of these studies is available on the INCLUDE Data Hub Portal: https://portal.includedcc.org/studies.