T2D-GENES Project 2: San Antonio Mexican American Family Studies

The Type 2 Diabetes (T2D) Genetic Exploration by Next-generation sequencing in Ethnic Samples (T2D-GENES) Consortium is a collaborative international effort to identify genes influencing susceptibility to T2D in multiple ethnic groups using next generation sequencing. T2D-GENES Project 2 is a complex pedigree-based study designed to identify low frequency or rare variants influencing susceptibility to T2D, using whole genome sequence (WGS) information from 1,043 individuals in 20 Mexican American T2D-enriched pedigrees from San Antonio, Texas. The major objectives of this study are to identify low frequency or rare variants in and around known common variant signals for T2D, as well as to find novel low frequency or rare variants influencing susceptibility to T2D.

The sampled individuals are obtained from two studies: the San Antonio Family Heart Study (SAFHS) and the San Antonio Family Diabetes/Gallbladder Study (SAFDGS), collectively referred to as the San Antonio Mexican American Family Studies (SAMAFS). The strategy is to sequence approximately 600 individuals at an average of 50x coverage across the entire genome, then impute genome wide genotypes for about 440 additional family members. The 600 sequenced individuals are specifically chosen for their value in imputing sequence information into other family members. By studying large pedigrees, we expect to find multiple individuals carrying each genetic variant, even if this variant is very rare in the population at large. Thus, a pedigree-based approach provides an excellent opportunity for identifying rare novel variants influencing risk of T2D and quantitative variation in T2D-related phenotypes. The whole genome sequencing has been done commercially by Complete Genomics, Inc. (CGI).

The final data set includes whole genome sequence data for 607 individuals. After quality control, 585 sequenced individuals provide data for family based imputation, using Merlin linkage analysis software, into approximately 440 additional family members for whom chip based genotypes are available to indicate which parental haplotype is transmitted.

Extensive phenotype data is provided for 1048 individuals. These include 5 sequenced individuals who do not belong to any of the 20 large pedigrees. Phenotype information was collected between 1991 and 2011 in the two contributing longitudinal studies. SAFHS participants may have information from up to 5 visits, and SAFDGS participants may have up to 4 visits. The clinical variables reported are coordinated with T2D-GENES Project 1 (multi-ethnic exome sequencing) and include T2D status and age at diagnosis, glycemic traits (fasting and 2 hour glucose and insulin), blood pressure, blood lipids (total cholesterol, HDL cholesterol, calculated LDL cholesterol and triglycerides), clinical chemistry (cystatin c, glutamic acid decarboxylase antibody titer (GadAb), creatinine, adiponectin and leptin). Glycated hemoglobin (HbA1c) was not measured for these individuals and insulin C-peptide is not included in this data set. Additional phenotype data include the medication status at each visit, classified in four categories as any current use of diabetes, hypertension or lipid-lowering medications, and, for females, current use of female hormones. Anthropometric measurements include age, sex, height, weight, hip circumference, waist circumference and derived ratios. Each phenotype variable has an initial summary column containing the most recent non-missing measurement for each individual, followed by the five potential time points for each individual, the number of non-missing measurements, and the age and year for the most recent non-missing measurement. For historical reasons, the order in which variables are presented on the dbGaP web site differs from their order in the data download file. When reading the comment fields for each variable, please note that commas are omitted to support data exchange in .csv format.