National Cancer Institute (NCI) Genome Wide Association Study (GWAS) of Lung Cancer in Never Smokers
A genomewide study of lung cancer in never smokers
Abstract and specific aims
In the United States, lung cancer incidence and mortality rates have been steadily declining over the past decade, following decline in the prevalence of tobacco smoking. However, lung cancer remains the leading cause of cancer death, killing more patients than breast, colon, and prostate cancers combined.
Although tobacco smoke is the predominant risk factor for development of lung cancer, some patients develop the disease without a history of tobacco smoking. About 10 - 15% of all lung cancers occur in lilfetime never smokers. This figure will increase as the proportion of never smokers increases in the population. Even at present rates, lung cancer in never smokers, if considered a separate disease, is 6th to 8th top cause of cancer death.
The growing number of never smokers in the USA and other countries emphasizes the importance of understanding the epidemiology and biology underlying lung cancer in this group.
Genetic polymorphisms associated with the risk of lung cancer in never smokers are expected to overlap with those associated with the risk of lung cancer in ever smokers only partially. Epidemiological, molecular and clinical data suggest that molecular mechanisms of LC may differ in smokers and non-smokers, implying that lung cancer in never smokers is a different disease compared to the lung cancer in smokers. One can expect that there should be stronger genetic component in the control lung cancer in never smokers because effects of the genetic factors in never smokers are unmasked by the lack of tobacco smoke exposure.
The genetic epidemiology of lung cancer in never smokers has not been well explored, largely because of difficulties in accruing the needed sample size for association studies. We propose a multicenter (total 14 sites from the US and Europe) genomewide association study of lung cancer in never smokers with the following specific aims:
Aim 1: To identify candidate SNPs influencing risk for lung cancer in never smokers using Discovery sample.
In the Discovery phase we will genotype 1256 Caucasian cases and 1365 age- and gender-matched never smoker controls using the Illumina Human660W-Quad platform. In addition, we will include in the analysis 284 cases and 175 matched controls already genotyped on the 610Quad platform. In this phase we will only include the study sites that have collected blood specimens (MDACC, Mayo Clinic, Karmanos Cancer Institute, The University of Liverpool Cancer research Centre, Institute of Cancer Research in Sutton, and Lunenfeld Research Institute in Toronto, Canada). All the samples will be sent to the independent lab for genotyping, to reduce site-specific technical artifacts. The final sample will consist of 1540 cases and 1540 controls matched by study site.
Aim 2: To perform the second phase (validation) analysis of significant SNPs identified in aim 1 using an independent set of cases and controls.
SNPs associated with risk at the significance level of 0.01 or below in the discovery set will be included in the replication phase. The proposed threshold guarantees an adequate power to retain SNPs with the typical effect size of 1.3. We plan to carry 6000-7000 SNPs for validation. The independent replication set will include 800 cases and 800 controls, mostly from sites that collected tissue (Mayo Clinic, Karmanos Cancer Institute, UT Southwestern) or buccal specimens (UCLA), but also blood samples (Imperial College London, University of Pennsylvana, German Cancer Research Center, Heidelberg, National Research Center for Environment and Health, Neuherberg, Carmel Medical Center, Haifa). We will then perform a joint analysis to test the significance of the SNPs identified in the first stage using a stringent critical p-value of 10-7. There will be 2340 cases and 2340 controls in the joint set. Based on our experience with GWAS in smokers and assuming that genetic component in lung cancer risk in never smokers can be higher than genetic component in smokers, we expect to identify about 5-10 candidate regions associated with lung cancer risk in never smokers.
Aim 3: To identify and explore pathways associated with the risk of lung cancer in never smokers.
Results of the number of studies on the molecular mechanisms and drug response suggest that lung cancer in never smokers is a different disease and different pathways will be associated with lung cancer risk in non-smokers and smokers. To identify pathways and molecular functions associated with lung cancer risk in never smokers we will apply Ingenuity and DAVID bioinformatics tools. We will use at least 300 top candidate genes identified in joint and discovery analysis. The reason why we select rather large number of candidate genes for functional annotation is two-fold: 1. Both algorithms are looking for enrichment of pathways and function by most significant genes and they produce statistically robust results only when number of genes is relatively high. 2. Despite the fact that this study will be largest possible for never smokers we still are underpowered to detect SNPs with relatively small effect size. But though those SNPs will not reach genome wide level of significance they will tend to be on the top of the list. In other words genes from the gray zone (significant on individual level and non-significant for genome wide level) are expected to be enriched by true discoveries. True discoveries are likely to be associated with limited number of pathways / functions while false positives are expected to be uniformly distributed across functions and pathways. Therefore significant clustering of the gene to a given function will suggest that that those genes are true discoveries.
This is the first GWAS aiming at identifying the genetic control of susceptibility to lung cancer in Caucasian never smokers. We will combine the available resources from the multiple sites to achieve the sample size sufficient for this study. The study will identify genetic architecture of the predisposition to the lung cancer in never smokers.
- Type: Case-Control
- Archiver: The database of Genotypes and Phenotypes (dbGaP)