Whole Genome Scan for Pancreatic Cancer Risk in the Pancreatic Cancer Cohort Consortium and Pancreatic Cancer Case-Control Consortium (PanScan)

Within the framework of the NCI-sponsored Cohort Consortium, investigators from 12 prospective epidemiologic cohorts formed the Pancreatic Cancer Cohort Consortium in 2006. This study, also known as "PanScan", is funded by the National Cancer Institute (NCI) and involves conducting a genome-wide association study (GWAS) of common genetic variants to identify markers of susceptibility to pancreatic cancer. In 2007, the study was expanded to include 8 case-control studies. The study team includes scientists from the cohorts comprising the Consortium, the NCI and the Pancreatic Cancer Case Control Consortium (PanC4). PanScan I and II were conducted in 12 cohort studies and 8 case-control studies, leading to the discovery of four novel regions in the genome associated with risk for pancreatic adenocarcinoma.

The third phase of PanScan (PanScan III) was conducted using recently identified incident pancreatic cancer cases drawn from fourteen cohorts from the cohort consortium, including nine prospective cohorts who participated in PanScan I, and five newly joined cohorts. The nine cohort studies that participated in PanScan I and had new genotyping of cases in PanScan III include ATBC, CPS-II, EPIC, HPFS, NHS, PHS, PLCO, SMWHS, and WHI; the five newly joined cohort studies include the Agricultural Health Study (AHS), the Multiethnic Cohort Study (MEC), the Melbourne Collaborative Cohort Study (MCCS), the Vitamins and Lifestyle Study (VITAL), and Selenium and Vitamin E Cancer Prevention Trial (SELECT). In addition to the cases from cohorts, we also included cases from the Gastrointestinal Cancer Clinic of Dana-Farber Cancer Institute Study (DFCI-GCC); from the University Hospital in Heidelberg, Germany, which is part of a larger European clinical case-control study (PANDoRA); and from clinic-based cases from eastern Spain (PANKRAS-II).

The dbGaP datasets available include all subjects previously made available from PanScan I and II, plus 1,582 new incident pancreatic cancer cases of European descent from prospective cohorts, case-control studies or case series (genotyped as part of PanScan III). Also included are 61 pancreatic cancer cases and 67 control subjects from PanScan I as well as 173 pancreatic cancer cases from PanScan III of Asian ancestry from the Shanghai Men's and Women's Health Study (Supplemental Table 10, Wolpin et al. (Nat Genet, 2014)). The control population used in the analysis for the Wolpin et al. manuscript included cancer-free individuals from the prospective cohorts that contributed pancreatic cancer cases to PanScan III and controls from the Spanish Bladder Cancer SBC/EPICURO study that were previously genotyped using the OmniExpress, Omni 1M or Omni 2.5M SNP arrays. The data from these control subjects were posted to dbGaP under the GWAS in which they were initially genotyped and will not be made available in duplicate under this dbGaP study.

The summary statistics for PanScan I-III were generated as detailed in Wolpin BM. et al., Genome-wide association study identifies multiple susceptibility loci for pancreatic cancer, Nature Genetics 2014; 46(9):994-1000 (https://www.nature.com/articles/ng.3052), and Klein, A. et al., Genome-wide meta-analysis identifies five new susceptibility loci for pancreatic cancer, Nature Communications, 2018;9(1):556 (https://www.nature.com/articles/s41467-018-02942-5). The dataset includes results from an association study of 5,117 individuals diagnosed with pancreatic ductal adenocarcinoma (PDAC) and 8,845 control individuals, or a total of 13,962 subjects of European ancestry (as compared to the genotype and phenotype information under this project that includes 9,437 individuals (PanScan I and II PDAC case and control individuals and PanScan III PDAC cases only). This is due to the fact that PanScan III “borrowed” GWAS data from control individuals genotyped separately from the PanScan GWAS project and are therefore not included as raw genotypes in phs000206.v5.p3. Association analysis was performed separately for PanScan I-II and PanScan III, followed by a meta-analysis of the two datasets. Results were filtered based on a minor allele frequency (MAF) < 0.01, an imputation INFO score < 0.3 and a heterogeneity P-value < 1x10-10 leaving a total of 9,758,390 variants.

Columns in the summary statistics dataset are as follows:

ID: variant rsID
Chr: chromosome number
Position: position in the chromosome, genome build GRCh37/hg19
MarkerName: variant identifier
Allele1: reference allele
Allele2: alternative allele
Freq1: allele frequency for allele2
FreqSE: standard error of the allele frequency
MinFreq: the minimal allele frequency across studies
MaxFreq: the max allele frequency across studies
Effect: effect size for allele2
StdErr: standard error
P-value: meta-analysis p-value
Direction: summary of effect direction for each study
HetISq: I^2 statistic which measures heterogeneity on scale of 0-100%
HetChiSq: chi-squared statistic in simple test of heterogeneity
HetDf: degrees of freedom for heterogeneity statistic
HetPVal: P-value for heterogeneity statistic

Type: Case-Control
Archiver: The database of Genotypes and Phenotypes (dbGaP)