Need Help?

HTAN MCL Pre-Cancer Atlas Pilot Project - Targeted Sequencing Development Study

The HTAN-MCL Pre-Cancer Atlas Pilot Project (PCAPP) is the result of a collaboration between the seven members of the MCL consortium. Across four organ types, PCAPP's goal is to collect and profile pre-malignant lesions for gene expression, DNA mutations, single-cell gene expression and immune-environment. Most PML are small in size and only available come from formalin fixed paraffin embedded archived tissue. The primary goal of PCAPP is to 1) understand the logistical challenges of PML specimen collection, 2) document technical limitations of the assays that are specific to the PML and 3) overcome them to support the generation of a more comprehensive Pre-Cancer Atlas in the future. The current upload provides RNA and DNA sequencing from participants with DCIS who were studied at the University of San Diego and the University of Vermont.

Description of the overall study:

A. Background/Significance

One of the critical barriers to developing new approaches for cancer detection and prevention is the lack of understanding of the key molecular and cellular changes that cause cancer initiation and progression. Unlike the extensive work that has been done profiling advanced stage tumors, few studies have comprehensively profiled the molecular alterations found in precancerous tissues. Premalignant lesions are currently characterized by histologic changes that precede the development of invasive carcinoma1,2.These lesions can often be identified in regions surrounding an invasive tumor or in biopsies taken from patients undergoing diagnostic evaluation for suspicion of cancer. Currently, limited metrics exist to identify lesions that will likely progress to carcinoma and require intervention from those that will naturally regress or remain stable3,4. Characterization of the molecular alterations in premalignant lesions and the corresponding changes in the microenvironment would hasten the development of biomarkers for early detection and risk stratification as well as suggest preventive interventions to reverse or delay the development of cancer.

Our pilot study will establish the feasibility of transcriptomic, genomic and immune profiling of FFPE premalignant lesions from multiple organ sites, collected and profiled with uniform SOPs across multiple institutions within the MCL consortium. We will characterize the molecular alterations in precancerous lesions and the corresponding microenvironment in four major organ sites, in order to uncover the molecular and cellular determinants of premalignancy, and establish standardized sequencing and immunohistochemistry protocols on FFPE precancerous tissue. We will also evaluate the technical feasibility of single nuclei sequencing of small FFPE pre-cancer lesions. Successful completion of the proposed pilot study will set the stage for expansion and development of a comprehensive Pre-Cancer Atlas (PCA) as part of the NCI's moonshot.

B. Specific Aims

Aim 1: Collect premalignant lesions (PML) and their associated microenvironment via LCM from FFPE tissue across four organ sites (breast, lung, pancreas & prostate).

Aim 2: Perform bulk RNA and DNA seq on premalignant FFPE samples (and flash frozen tissue where available) and compare the genomic/transcriptomic alterations within and across organ sites.

C. Approach

Aim 1: Collect premalignant lesions (PML) and their associated microenvironment via LCM from FFPE tissue across four organ sites (breast, lung, pancreas & prostate).

Methods

I. Patient Population/Sample Collection: Overview of the sites collecting PML tissue from the respective organs is provided in Table 1 and a full description of the biospecimens to be obtained is described in detail for each organ type below.

Table 1. Breakdown of cohort by tissue type and collection site.

Organ siteBreast
Lung
PancreasProstate

Type of PML

DCIS
AAH, Squamous Dysplasia/CIS
IPMNsPIN
Collection of PatientsUCSF/UCSDUVMBU*/UCLAVanderbilt/MoffittMDACC*JHUStanford*
# of Patients

20

1920 (10 of each type)20 (10 of each type)242020
Total patients per Organ39

40
2440

Note: single nuclei/cell RNA-Seq will be performed on 4-5 FFPE samples from each of the organ types

1. DCIS lesions from breast tissue:

DCIS lesions will be collected from 39 patients (20 from UCSF/UCSD & 19 from UVM) with primary low or high-grade DCIS diagnosed from a breast core biopsy. Subsequent resected lumpectomy or mastectomy tissues will be prospectively sampled in the vicinity of the prior biopsy site using multiple approaches: 1) Live cells (heterogeneous mix) will be obtained as a cell scrape slurry from the lesion surface or by fine needle aspirate (FNA); 2) For a subset of specimens where size is sufficient, a block of breast tissue with DCIS will be fresh-frozen; 3) The remainder of the specimen will be taken for routine formalin-fixation and paraffin-embedding (FFPE). The FFPE sample will be annotated to identify the matched FFPE tissue block adjacent to the fresh-frozen sample and will be sectioned for use in bulk and single nuclei sequencing . We will dissect DCIS, adjacent normal and when available, associated carcinoma. In addition, when possible, normal tissue will be collected from a tissue block lacking lesions as well as collection of blood. A subset of patients (n = 5 | FFPE, flash frozen and fresh) will be sent to the Broad Institute for single nuclei/cell sequencing.

2. AAH and squamous dysplastic/CIS lesions from airway and lung tissue:

For squamous cell lung cancer, we will collect endobronchial biopsies from abnormal airway regions identified on autofluroscence bronchoscopy or identify PMLs in the margins of resected lung tissue. We will study 20 patients (5 each from BU/UCLA/Vanderbilt/Moffitt) with pre-invasive squamous lesions (moderate-severe dysplasia or carcinoma in situ (CIS)) identified on pathologic examination. LCM of the premalignant region and adjacent normal epithelium will be performed as well as the invasive tumor for those collected from the resection margin (n=5 from UCLA). On a subset of lesions collected at bronchoscopy (n=5), we will collect additional biopsies that will be flash frozen and fresh for single nuclei and cell sequencing, respectively, performed at the Broad Institute. In parallel to the work at the Broad, BU will perform single cell RNA-seq on these freshly cell sorted tissues (n = 5). Blood will be collected on all patients for genomic studies.

For lung adenocarcinoma, we will collect resected FFPE lung tissues from 20 patients (10 from UCLA and 10 from Vanderbilt/Moffitt) with early stage lung adenocarcinoma that harbor atypical adenomatous (AAH) premalignant lesions in the resection margin. We will LCM multiple AAH regions (3-5 per patient) as well as adjacent regions of normal epithelium and invasive adenocarcinoma. In addition, blood will be collected on all patients for genomic studies.

3. IPMNs from pancreatic tissue:

For pancreatic cancer PML, we will collect low and high grade lesions from 24 patients representing macroscopic Intraductal Papillary Mucinous Neoplasms (IPMN) (n=24) from surgically resected specimens along with blood samples. Archival FFPE specimens of microscopic PanIN lesions, occurring multi-focally adjacent to invasive PDAC, and archival IPMN lesions (with or without associated invasive cancer), along with the adjacent normal tissue, will undergo LCM and utilized for bulk DNA and RNA sequencing. If matched frozen tissues are available for a subset of these FFPE samples, we will bank for comparison of profiles. Because IPMNs are macroscopic lesions, they provide an opportunity for obtaining the samples fresh and therefore can be used for single cell sequencing (in contrast to PanINs). Therefore, 5 freshly obtained IPMNs will be used for the single cell RNA sequencing studies performed at both the Broad Institute and MDACC, and the matched FFPE and/or frozen sections from these lesions (obtained from the adjacent PML) will be sent to Broad Institute as a pilot to assess "single nuclei" RNA sequencing.

4. PINs from prostate tissue:

For prostate cancer PML, there will be 40 samples of Prostatic Intraepithelial Neoplasia (PIN) collected between the Stanford and JHU sites (20 cases per site). At the Stanford site, 20 prostate specimens detected by PSA screening who have/will undergo surgery (radical prostatectomy) for clinically localized disease will make up the final cohort. The age range of the participants would be 40-75, and we anticipate that 18 will be Caucasian, 1 Asian and 1 Latino or African American based on the practice demographics practice at Stanford. Clinical and MRI data will also be collected for these samples. We will collect low grade (e.g. Gleason score of 6/Grade group 1; n=10) and high grade (Gleason score 4+3=7 or higher/Grade group 3 or higher; n=10) PINs from FFPE samples that have prostate carcinoma. In addition to obtaining LCM archival samples of low and high grade PIN, we will also obtain normal prostatic epithelial from the peripheral, central and transition zones as well as multiple samples of prostate carcinoma in order to obtain the spectrum of Gleason grades in the carcinoma as needed. LCM samples will be used for bulk DNA and RNA sequencing. In addition, single cells will be dissected from FFPE samples to prepare single cell RNA seq libraries using techniques developed at Stanford, and FFPE tissue will be sent to the Broad for single nuclei sequencing. When available, flash frozen and fresh samples from these prostates will be archived and prepared for single nuclei and cell sequencing, respectively, at the Broad Institute and at Stanford (single cell only). JHU will also capture 10 cases (5 grade group 1 and 5 grade group 2) of high grade PIN, normal and invasive adenocarcinoma using frozen sections from fresh frozen tissues. When possible these will be from the same patients as the FFPE samples.

Since frozen sections can be quite challenging to morphologically determine high grade PIN from normal epithelium, for these samples we will perform a number of additional tissue-based characterizations. These will include a multicolor combined basal cells (p63 and CK903) and PIN/carcinoma markers (AMACR) referred to in the cocktail as "PIN4", c-MYC (referred to as MYC) protein5, by IHC and mRNA by in situ hybridization (AM De Marzo, Q Zheng unpublished observations), telomere length by in situ hybridization6 and the 5'ETS/45S rRNA7. For these slides, the whole slides will be scanned with a Hammamatsu Nanozoomer with a 40x objective and regions of interest will be annotated as a guide for LCM.

II. Laser-capture microdissection (LCM): FFPE tissue blocks will be sectioned at 7μm thickness and serial sections will be stained with H&E. LCM will be performed utilizing standard LCM systems, such as Leica LMD7000 and ArcturusXT at each site. Regions of premalignancy will be dissected and RNA/DNA will be extracted from microdissected cells using the Qiagen All Prep DNA/RNA FFPE Kit.

Aim 2: Perform bulk RNA and DNA seq on premalignant FFPE samples and compare the genomic/transcriptomic alterations within and across organ sites.

Rationale: There have been limited studies characterizing the genomic and transcriptomic landscape of premalignant lesions associated with breast, pancreatic, lung or prostate cancers. Characterizing the molecular determinants of premalignant disease that are unique and shared across multiple organs will enable new candidate biomarkers for early detection and novel therapeutic strategies for early intervention.

Methods

Bulk RNA-seq of LCM FFPE tissue: All participating sites will perform bulk RNA-seq in accordance with SOPs developed at BU. In brief, total RNA will be isolated from LCM'd lesion and associated microenvironment tissue using the Qiagen All Prep DNA/RNA FFPE Kit and quality will be assessed with the Agilent Bioanalyzer. Libraries will be generated with the Illumina TruSeq Access kit (for FFPE samples). They will be sequenced on the Illumina HiSeq2500 with 75base-pair paired-end reads. Quality of FASTQ files will be assessed with FastQC. Reads will be aligned to the human genome with STAR and gene-level and isoform-level expression will be quantified with RSEM. Splice junction saturation, transcript integrity, and biotype distributions will be calculated for each sample with RSeQC. DESeq2 and EdgeR will be used to identify associations between gene expression profiles and clinical variables while controlling for confounding covariates.

BU will serve as an RNA-seq Core to assess reproducibility of FFPE RNA-seq methods across sites. We will perform RNA-seq according to the SOP listed above on a subset of samples for each organ type (total n ~ 20).

Bulk seq of DNA from FFPE tissue: All participating sites will perform targeted or whole exome-seq (WES) in accordance with SOPs. In brief, DNA from laser captured material will be isolated using the Qiagen All Prep DNA/RNA FFPE Kit and undergo stringent quality control to ensure high quality input material for genomic profiling. Purified DNA (ideally 100-200 ng) will be used for library preparation and amplification, followed by next generation sequencing using standard protocols distributed by CDMG. Exome-seq methods are considered standardized, thus we will not need a DNA-seq Core to assess reproducibility across sites. We anticipate local centers will use Illumina paired end reads, following the following general approach. 1) DNA library preparation: Paired-end libraries will be prepared following the manufacturer's protocols (Illumina and Agilent), fragmented to 150-200 bp 2) Capture of targeted exome: Whole exome capture will be carried out using the protocol for Agilent's SureSelect Human All Exon kit. Purified capture products will be amplified using the SureSelect GA PCR primers (Agilent) for 12 cycles. 3) Sequencing will be carried out for the captured libraries using at least 100 bp paired-end reads. To achieve high level sensitivity and accuracy for detecting all the mutations in the whole exome, each sample will be sequenced at 200X mean depth. 4) Read mapping and alignment and variant analysis: Sequence short reads will be aligned to a reference genome (NCBI human genome assembly build 38) using BWA-MEM. Local realignment of aligned reads will be performed using Genome Analysis Toolkit (GATK).

Data QC: To ensure scientific rigor and consistency among sites in RNA and DNA processing we will include a preliminary analysis of steps in processing and analysis. Protocols for extraction of high quality RNA and DNA from formalin fixed paraffin embedded (FFPE) tissues, which will be used extensively in these studies continue to improve and may have variable implementation among the sites participating in this study. To evaluate consistency of preliminary steps in processing and downstream analyses, we will initially distribute slides from one large FFPE fixed cancer of origin from prostate, breast, lung and pancreatic cancer. Analysis of these samples will allow us to review the DNA and RNA characteristics (yield, purity and strand length) among sites. Downstream analysis of these same samples will also allow us to compare among sites the consistency of variant calls among centers. We will be able to identify if there are some times of calls (such as small insertion deletions) that are more variable among centers versus other types of calls (such as relative gene expression or single base pair substitutions) that we expect to be less variable and to characterize the reliability of findings across sites. We are also including a 5% blind duplicate analysis of RNA sequencing. Samples will be analysed by the participating genomics cores without knowledge of the phenotype. RNA seq and CNA analyses are normalized for batch effects. We will also compare the observed sex to the self-reported sex as based on RNA profiles and exome sequencing of X chromosome genes as another check for processing accuracy and sample management.

D. References

1. Wacholder, S. Precursors in Cancer Epidemiology: Aligning Definition and Function. Cancer Epidemiol. Prev. Biomark. 22, 521-527 (2013). PMID: 23549395.
2. Berman, J. J. Precancer: The Beginning and the End of Cancer. (Jones & Bartlett Learning, 2011).
3. Nasiell, K., Nasiell, M. & Vaćlavinková, V. Behavior of moderate cervical dysplasia during long-term follow-up. Obstet. Gynecol. 61, 609-614 (1983). PMID: 6835614.
4. Merrick, D. T. et al. Persistence of Bronchial Dysplasia Is Associated with Development of Invasive Squamous Cell Carcinoma. Cancer Prev. Res. (Phila. Pa.) 9, 96-104 (2016). PMID: 26542061.
5. Gurel, B. et al. Nuclear MYC protein overexpression is an early alteration in human prostate carcinogenesis. Mod. Pathol. Off. J. U. S. Can. Acad. Pathol. Inc 21, 1156-1167 (2008). PMID: 18567993.
6. Meeker, A. K. et al. Telomere shortening is an early somatic DNA alteration in human prostate tumorigenesis. Cancer Res. 62, 6405-6409 (2002). PMID: 12438224.
7. Guner, G. et al. Novel Assay to Detect RNA Polymerase I Activity In Vivo. Mol. Cancer Res. MCR 15, 577-584 (2017). PMID: 28119429.