The goal of the study is to confirm and further characterize structural variants and complex genomic rearrangements using adaptive sampling with Oxford Nanopore Sequencing. Individuals with known structural variants were included in the study and the known rearrangement regions were targeted in the sequencing experiments. In addition to long-read coverage over the targeted regions (around 25X coverage), the rest of the genome is covered by short off-target reads (around 5X coverage).
Medulloblastoma (MB) is the most common malignant brain tumor in children. It is a neuroectodermal tumor located in the cerebellum. International consensus recognizes four distinct subgroups of MBs including Wingless (WNT), Sonic Hedgehog (SHH), Group 3 (G3), and Group 4 (G4) with distinct molecular characteristics, prognoses, and mortality rates in patients. Here, we have compiled and evaluated a large international MB cohort for which we generated transcriptomic data (RNAseq) for 208 primary MB patient samples.
MOSAIC is a collaborative initiative founded by Owkin, Lausanne University Hospital (CHUV), Charité Universitätsmedizin Berlin, University Hospital Erlangen (UKER), Gustave Roussy Institute in Paris, and University of Pittsburgh. The goal of MOSAIC is to build the largest collection of spatial omics data in cancer. By integrating comprehensive high quality clinical annotations with advanced deep profiling techniques, MOSAIC aims to uncover novel cancer subtypes and identify key drug targets and biomarkers within them.
Metastatic prostate cancer (mPC) is enriched for homologous recombination repair (HRR) gene alterations, which have prognostic and predictive value. Routine clinical implementation of next-generation sequencing (NGS) is still limited. We investigated the association between genomic and functional loss of HRR, using NGS and a RAD51 immunofluorescence (RAD51-IF) iprimary or metastatic biopsies from patients with stage IV prostate cancer. Whole-exome sequencing was pursued for paired tumor-normal samples.
Acute intermittent porphyria (AIP) is an hereditary rare metabolic disorder of incomplete penetrance, affecting the biosynthesis of heme. IAP is an autosomal dominant disorder, resulting in a substantial reduction of the activity of the hydroxymethylbilane synthase coding gene (HMBS). Most HMBS mutation carriers are asymptomatic and only between 10% and 20% of patients present acute attacks of the associated symptoms. Here, we sequenced whole genomes of 16 AIP patients.
Understanding the cellular origin and differentiation status of glioblastoma is critical to resolve the etiology of the disease. we profile 18 patient glioblastomas by single cell RNA sequencing (scRNAseq). From this, we uncovered two principal cell-of-origin relations. Each lineage displays unique directional differentiation trajectories and transcriptional cores from the naïve cell populations. Thus, glioblastoma is defined by robust cell lineage features which may provide insights into the cell origin of the diseases.
Human lung tissue-resident NK cells (trNK cells) are likely to play important roles in viral infections, inflammation and cancer. However, knowledge about lung trNK cells is lacking but is fundamental for exploiting these cells in therapeutic approaches. Here we analysed the transcriptome of CD69+CD49a+CD103+CD16-CD56bright, CD69+CD49a+CD103-CD16-CD56bright, CD69+CD49a-CD103-CD16-CD56bright, CD69-CD49a-CD103-CD16-CD56bright, and CD56dimCD16+NKG2A+CD57- NK cells isolated from human lung.
We investigated an intergenic haplotype on chr21q22, linked to five different inflammatory diseases, and discovered a mechanism that orchestrates macrophage responses during chronic inflammation. We delineated how the risk haplotype increases expression of the causal gene, ETS2, and demonstrated that ETS2 is necessary for inflammatory macrophage effector functions. To establish whether ETS2 is sufficient to drive inflammatory responses, we overexpressed ETS2 in a dose-dependent manner and performed RNA-sequencing to characterise the transcriptional effects.
Policy Documentation The following policy documentation is required to be prepared and submitted to the EGA, together with your data files and associated metadata. Data Access Agreement (DAA) The Data Access Agreement is a contract made between Data User and Data Access Committee. The agreement should be drafted by the DAC and includes, but is not limited to, details of data use, publication embargoes and storage. Completion of a DAA by the applicant/s should form part of the application process to the DAC. NOTICE The data access agreement template below is provided for guidance only and should be adapted as you see fit to suit your own purpose. In the interest of promoting data sharing, we suggest that if an agreement cannot be met around clause 19 in this example that both parties should agree to remain silent, and that the clause should be removed from the agreement. Example DAA template Alternate (harmonised) DAA template
TransplantLines is designed as a single-center, prospective cohort study and biobank including all different types of solid organ transplant recipients as well as living organ donors. In the TransplantLines gut microbiome study the gut microbiome of solid organ transplant recipients is characterized and linked to clinical phenotypes. This batch contains the cross-sectional data from liver transplant recipients and longitudinal data from renal and liver transplant recipients.
The goal of this study is to study gene expression patterns in tumors, normal tissues, and cell lines.
Periventricular nodular heterotopia is a malformation of cortical development characterized by nodules of abnormally migrated neurons.
Acute myeloid leukaemia (AML) is an aggressive and molecularly diverse disease with a poor overall survival of 20-25%. With an annual incidence of 2.9 per 100,000, AML is currently the commonest myeloid malignancy in Europe, yet the two main therapeutic options for this disease, anthracyclines and purine analogues, have remained unchanged for over 20 years. Currently patients are stratified at diagnosis according to a series of clinicopathological parameters (e.g. age, white cell count and presence/absence of previous clonal haematological disease) and molecular markers (e.g. chromosomal translocations/deletions, aneuploidy and mutations in genes such as FLT3 and NPM1). Patients with adverse prognostic features, whose prognosis is particularly poor (e.g. <15% long-term survival) are offered treatment with allogeneic bone marrow transplantation (allo-BMT) if a sibling or unrelated donor is available. This can significantly improve survival (e.g. up to 40% long-term survival in some contexts), albeit at the expense of significant toxicity and transplant-related mortality (TRM). Allo-BMT is thought to work in part by allowing the delivery of large doses of chemotherapy followed by haemopoietic "rescue" with donor haemopoietic stem cells (haemopoietic failure would otherwise ensue). However, potentially the most potent effect of allo-BMT is the cytotoxic effect of donor lymphocytes against AML blasts, a phenomenon known as graft-vs-leukaemia (GVL) effect. Increasingly, transplants using reduced chemotherapy intensity (mini-allografts) are being used that partially circumvent the toxicity from chemotherapy and rely on GVL to effect cure. Nevertheless, AML relapse after allo-BMT still occurs at a significant rate of up to 80% depending on the type of transplant. There is accumulating evidence that genetic events in residual leukaemic cells enable them to evade immunodetection and therefore survive the GVL effect and expand to cause relapse. The most striking example of this is the loss of HLA antigens after transplants in which donor and recipient are not fully HLA-matched. In these cases, the leukaemia "deletes" the genomic region containing the disparate HLA antigen which was preferentially targeted as "foreign" by the GVL effect. However, the genetic basis of immune evasion in the majority of transplants, which are fully HLA matched, is not known. One possibility is that loss of genes coding for antigens outside the HLA locus but which are also targets of GVL may operate, alternatively genetic events that affect processes downstream of immunological cytotoxicity may be responsible.The identification of genetic events that mediate immune evasion would not only facilitate the understanding of this process but can help plan therapeutic interventions that improve the outcomes of allogeneic transplantation for AML and other disorders. We intend to study this by conducting exome sequencing on 6 cases of AMLs from patients that attend my clinic at Addenbrooke's hospital and have relapsed after allogeneic transplantation. Samples from AML diagnosis, remission/normal and AML relapse (total n=18) will be studied to identify somatic mutations in the primary AML and those acquired by the relapsed clone. The 18 samples will also be studied by array CGH to detect regions of genomic amplification or deletion.
With the DAC Portal, it is possible to streamline the process of managing access to sensitive data, ensuring that researchers have the necessary resources to advance scientific discovery while maintaining the highest standards of data protection and privacy. We know that, as a new tool, its first use can be complex. For this reason, in this article we will try to show all the elements of interest. Enjoy! Credentials: who needs to create a new user to access the DAC Portal and who doesn't? Only new users (since September the 12th) need to create an EGA account to access the DAC Portal. The EGA team went one step ahead by creating EGA accounts for former DAC members to login to the DAC Portal. These accounts are linked to the personal email used in the DAC structure: To check the linked email, DAC members can paste the DAC ID right after this link: https://metadata.ega-archive.org/dacs/ (e.g. https://metadata.ega-archive.org/dacs/EGAC00001002543) In case of password oblivion, it is possible to set a new one. All the process is channelled through the personal email. Some DAC members may be interested in updating the DAC email. This action can be done contacting our Helpdesk team. For the correct functioning of the DAC Portal, users must add the missing information of the DAC Profile. For a deeper dive, check out our documentation. The first step: knowing your workspace on the Portal Once logged in to the platform using the EGA User credentials, note the sections available: DACs and Policies. In the first one it is possible to check all user's registered DACs by default. It provides a comprehensive overview of each DAC, including its EGA accessions (EGAC), title, status, and the number of data access requests associated with each DAC. Whereas the Policies section is where users can manage and view all their registered policies, along with their associated EGA accessions, links, and titles. A drop-down menu is available with shortcuts to the most useful pages in the top right corner. It is possible to check DACs and policies, as well as creating new ones. In this menu, users can also contact the Helpdesk team using the “Need Help?” section to make inquiries about DACs and policies. Comprehending the DACs list & making the most of all the features Colours are crucial in the DAC's section as they will indicate the user role and the status in every committee, namely: Yellow: you are the administrator of the DAC and have full control to add other contacts, modify the content, and manage data access requests. Orange: you are a member of the DAC and have the authority to manage data access requests but cannot make any changes to the content. Blue: you have been invited to join the DAC and need to either accept or decline the invitation. Grey: the DAC is currently pending validation by the EGA Helpdesk team. Red: the DAC has been declined by the EGA Helpdesk team. Discover more information on how to view the message from Helpdesk containing the declination reason or how to validate or reject a DAC invitation by taking the tour of the DAC Portal. How to create, register and edit a DAC Creating a DAC is an easy process that only requires users to complete the necessary information. After that, the EGA Helpdesk team will review the proposal and validate the proposal. Once the DAC is validated it is possible for authorised users with admin role to edit it at any time, ensuring that all information and contacts associated with the committee is accurate and up to date. Membership management is controlled by searching for the EGA user looking for their username, email, or organisation. Please, note that for a person's details to appear in this search engine, they must have registered as a EGA user. Data Access Committee administrators manage two types of permissions that apply to contacts: Main contact and Role. Both can be modified at any time: Main contact refers to the primary person we firstly contact for any inquiry about the DAC (please, note that we can contact all DAC members, and this refers to the person we will contact first). Role defines the actions that can be taken by each contact. There are two possible status, admin and member, details of which can be found in the Take the Tour. Keep in mind that, to save the modifications, it is always necessary to click on “Update”. Moreover, it is also possible to delete any contact in a DAC whenever it is needed. The data request process: how to manage access requests The data request process is now channelled through the EGA website. Once a user sends a request access, the petition will go directly to the DAC Portal profile of the DAC members liked to the requested dataset. Whenever a request is received, a sand clock symbol will be displayed next to the DAC responsible for the dataset involved. By clicking on the DAC, it is possible to discover more information about requester(s). In this section, DAC Portal users will also be able to modify the display of the data access requests. It is possible to group requests by dataset accession or username, depending on the user preferences. Additionally, it is possible to isolate specific requests by user or dataset. To accept a request, users must slide the button to the right. Once the button turns green, it signifies that the request is accepted; the requester will be granted access to the requested dataset. To make lives easier, we have implemented a “Select all” button to easily accept or revoke multiple requests at once. To decline requests, it is necessary to slide the button to the left. A red button means the requester will not be granted access to the dataset. The DAC Portal will always ask to submit the reason why the data access was denied. Please, note that any decision, grant or denial, must be confirmed clicking the "Apply" button. This will update the status of the request and notify the requester accordingly. Taking control of the activity in the DAC Portal: checking the requests history The History button is a useful feature that allows users to access a list of all the requests that have been granted. This is especially useful for auditing purposes, as it enables to keep track of who has been granted access to the data and when. To facilitate control for those users with several requests we have included a filtering option. It allows to filter requests by user, mail, dataset, EGA ID, and date (these are combinable to boost the search). Don't forget about policies: creation, management, and update Similarly to the DACs section, in the Policies one users can manage and view all their registered policies. By clicking on any policy, it is possible to view its details as well as making any necessary updates or modifications. The sheets symbol and its adjacent number indicate the quantity of objects using a policy. By clicking in the location symbol users will go directly to the DAC page on the DAC Portal to see more information about that DAC. DAC Portal users can create new policies. The first step requires to link the policy to a registered DAC. After defining the title, users can either add a link to an external URL containing terms & conditions or write the policy content directly on the DAC Portal. Keep in mind that both options can be included. Data Use Ontology (DUO) codes can be added to new policies. These codes are used to specify the permitted and prohibited uses of the data. Please, note that some of them require a modifier to provide additional specificity. Find out more by checking out this specific content on the DAC Portal Tour. Policies can be modified at any given time by clicking on one registered policy, which will display the current information.
Metformin is the first-line antidiabetic drug with over 100 million users worldwide, yet its mechanism of action remains unclear1. Here the Metformin Genetics (MetGen) Consortium reports a three-stage genome wide association study (GWAS), consisting of 13,123 participants of different ancestries. The C-allele of rs8192675 in the intron of SLC2A2, which encodes the facilitated glucose transporter GLUT2, was associated with a 0.17% (p=6.6x10-14) greater metformin induced HbA1c reduction in 10,577 participants of European ancestry. rs8192675 is the top cis-eQTL for SLC2A2 in 1,226 human liver samples, suggesting a key role for hepatic GLUT2 in regulation of metformin action. In obese individuals C-allele homozygotes at rs8192675 had a 0.33% (3.6mmol/mol) greater absolute HbA1c reduction than T-allele homozygotes.This is about half the effect seen with the addition of a DPP-4 inhibitor, and equates to a dose difference of 550mg of metformin, suggesting rs8192675 as a potential biomarker for stratified medicine.
This study contains methyl-binding domain sequencing and shallow whole genome sequencing from circulating free DNA (cfDNA) for 79 patients with small cell lung cancer (SCLC) and 78 non-cancer controls. We also sequence genomic DNA (both methyl-binding domain sequencing and shallow whole genome sequencing) from 30 circulating-tumour-cell derived explant models (CDXs, from 23 unique patients with SCLC), 20 patient derived explant models (PDXs, from 10 unique patients with SCLC) and 13 lung tissue samples.
The BEACCON study aimed to address the lack of power of previous studies to identify novel BC predisposition genes by performing extensive sequencing in 12,000 women (11,511 analysed following exclusions) and further enhancing power by using an ‘extreme phenotype’ design with enrichment of familial non-BRCA1 and BRCA2 cases, compared with a control population of older women with ongoing confirmation of cancer-free status at June 2019. Three-quarters of the 1303 candidate genes screened were selected based on empiric evidence from local (69 multi-case BC families) or international whole exome sequencing studies, and the remainder were included to provide detailed coverage of functional pathways with established associations with BC.
120 individuals from the TEENAGE study (Ntalla et al., 2013) have been genotyped on the Illumina HumanCoreExome-12v1-1_A array. This is a population-based study of adolescents from the Attica region in Greece
Immune memory is key to effective antimicrobial responses, but the impact of mRNA vaccines on this process is not fully understood. Our research shows that SARS-CoV-2 mRNA vaccines alter the epigenetic profile of human macrophages, specifically enhancing histone acetylation, which is linked to immune training. Significant epigenetic changes, along with increased cytokine release, require two vaccine doses. However, these effects diminish over time but can be restored with a booster dose six months later, maintaining a strong pro-inflammatory response.
X-linked Dystonia-Parkinsonism (XDP, DYT3) is a long-standing quandary in human disease genetics. XDP is predominantly observed on Panay island in the Philippines. This study is one of the first of its kind to interrogate an unsolved Mendelian disorder by integrating genome and transcriptome assembly methods using Illumina, 10X Genomics, Pacific Biosciences, and Agilent genome targeting technologies. These data provide strong evidence for a pathogenic link between a noncoding SVA retrotransposon and XDP. We demonstrate that this Mendelian disorder is associated with a sine-VNTR-Alu (SVA) retrotransposon that inserted into the TAF1 gene and is shared by all XDP probands, yet never observed in controls from worldwide populations. Transcriptome assembly in iPSC-derived neural stem cells (NSCs) and neurons revealed that this SVA caused aberrant splicing and significant intron retention, which was negatively correlated with TAF1 expression. Remarkably, CRISPR/Cas9 excision of the SVA rescued the aberrant transcriptional signature and normalized expression of TAF1 in patient-derived NSCs. We have also interrogated iPSC-derived microglia to investigate the contribution of glial components to the cellular and molecular deficit, effect of TNF treatment on XDP transcriptional signatures. To further understand what XDP may have in common with other hereditary dystonias (such as DYT6), we are further comparing these signatures to ones associated with dystonia-specific variants in proteins that, like TAF1, are also involved in regulating transcription, i.e. THAP1.
Spermatogenesis is a complex biological process that requires the coordination of thousands of genes. Perhaps because of the large number of genes involved, spermatogenic failure occurs frequently and affects approximately 1% of men. While environmental and genetic factors likely contribute to this disorder, it is thought that the majority of cases have an underlying genetic basis. The two most common genetic causes are deletions on the Y chromosome and cytogenetic abnormalities (/e.g./ Klinefelter syndrome, XXY), which each account for roughly 10-15% of cases of complete spermatogenic failure. Point mutations in several other genes have also been linked to spermatogenic failure, but their collective prevalence is very low. Thus, for approximately 70% of men with spermatogenic failure, the genetic cause remains unknown. We hypothesize that a disproportionate number of these remaining genetic variants reside on the sex chromosomes because they are hemizygous in males and contain a disproportionate number of genes expressed in the testis. To identify genes that are required for spermatogenesis, we have performed capture-based targeted sequencing in 301 men with azoospermia (complete absence of sperm) and 300 fertile controls. Our sequencing is focused on coding genes and conserved, non-coding sequences of the X and Y chromosomes. In addition, we have targeted ~497 autosomal genes that display exclusive or predominant expression in the testis. The total sequence space targeted is 21.3 Mb, and we require >70% of targets reach an average coverage of 20X.
CD47 is a cell surface molecule that inhibits phagocytosis of cells that express it by binding to its receptor, SIRPα, on macrophages and other immune cells. CD47 is expressed at different levels in normal cells, however, in cancer cells, CD47 transcript and protein expression is aberrantly increased. Here we sought to uncover the regulators of CD47 transcription, including active enhancers that increase its aberrant expression in cancer cells, in order to reveal mechanisms by which different neoplastic cells generate this dominant 'don't eat me' signal. Enhancers are genomic regions, often referred to as "switches", that can turn on or off the transcription of target genes. Recently the discovery of super-enhancers (SEs) has given more insight into the regulatory architecture of key genes that are highly expressed in a specific cell type, during a particular developmental stage or in disease. By analyzing the CD47 regulatory genomic landscape, we discovered: i) A distinct super-enhancer (SE) is associated with CD47 upregulation in breast cancer cells ii) Disruption of CD47 SEs by using the BRD4 inhibitor JQ1 robustly reduces CD47 gene expression; and iii) The TNF-NFKB1 signaling pathway is directly involved in the regulation of CD47 by interacting with a distal downstream constituent enhancer located within a CD47-associated SE specific to breast cancer. Our results describe a novel mechanism that cancer cells have evolved to drive CD47 overexpression to escape immune surveillance.
The primary objective of the project is to generate a systems-level view of human milk in the context of healthy mothers and their term infants. The study uses biospecimens and clinical data from the Mothers and Infants LinKed for Healthy Growth Study (MILk Study), a prospective observational cohort of approximately 500 exclusively breastfeeding women and their infants who are followed to 6 months of age. The project focuses on mature milk samples collected at 1 month postpartum and infant gut microbiome samples collected at 1 and 6 months of age. Specific Aim 1 is to identify maternal genetic and clinical factors that shape human milk gene expression. Using bulk and single-cell RNA sequencing of the milk cell pellet and genotypes from maternal DNA, we will identify novel genetic determinants of the milk transcriptome and assess potential modification of these genetic associations by gestational weight gain, diet, and other clinical factors. Specific Aim 2 is to describe key features of the normative human milk biosystem and their interactions with one another. Specifically, machine learning techniques will be used to characterize interaction networks and correlational structures among the following features of human milk: transcriptomics, microbiomes, oligosaccharides, metabolomics, lipidomics, and milk macronutrient composition. Specific Aim 3 is to establish how the milk biosystem is related to variation in infant gut microbiomes and health. dbGaP will house the maternal genotype and milk transcriptomics data.The infant microbiome sequence data are associated with BioProject PRJNA1019702.
The development of molecularly targeted therapies builds upon the identification of tumor clades with shared molecular abnormalities. Hereditary leiomyomatosis and renal cell carcinoma (HLRCC) is a familial syndrome resulting from germline mutation of the two-hit tumor suppressor gene, Fumarate Hydratase (FH). HLRCC renal tumors present often in young adults, exhibit unique histological features and are particularly aggressive. HLRCC is uncommon and poorly understood, and whether there are sporadic renal tumors with similar characteristics, is unclear. Notably, FH can also be inactivated somatically in RCC, and should be included among FH-deficient (FHD) kidney tumors. We identified a tumor clade without FH mutations but with similar morphological and molecular characteristics as well as aggressive clinical behavior, which we name FHDL (FHD-like). We show that FHD/FHDL tumors represent a unique subtype of CIMP (CpG Island Methylator Phenotype) with the highest levels of overall genome methylation and CDKN2A silencing. FHD/FHDL tumors are characterized by convergent activation of the Hippo pathway as determined by frequent and mutually exclusive mutations in NF2 and PTPN14. AKR1B10, an NRF2 target, was upregulated in FHD/FHDL tumors as well as GATA3, which suggests a shared origin in the renal medulla and is consistent with topological studies. There is loss of 5-hydroxymethylcytosine in both FHD and FHDL tumors. These data define a group of particularly aggressive renal cancers with shared molecular features and potential targets for therapeutic intervention.
The Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) is a collaborative effort comprised of a coordinating center and scientific researchers from well-characterized cohort and case-control studies conducted in North America and Europe. This international consortium aims to accelerate the discovery of common and rare genetic risk variants for colorectal cancer by conducting large-scale meta-analyses of existing and newly generated genome-wide association study (GWAS) data, replicating and fine-mapping of GWAS discoveries, and investigating how genetic risk variants are modified by environmental risk factors. To expand these efforts, we assembled case-control sets or nested case-control sets from 20 different North American or European studies. Summary descriptions and study participant inclusions/exclusion criteria for each of these studies are detailed below. The Black Women's Health Study (BWHS): Is the largest follow-up study of the health of African-American women (Cozier et al., 2004; Rosenberg et al., 1995) [PMID: 15018884; PMID: 7722208]. The purpose is to identify and evaluate causes and preventives of cancers and other serious illnesses in African-American women. Among the diseases being studied are breast cancer, colorectal cancer, type 2 diabetes, uterine fibroids, systemic lupus erythematosus, and cardiovascular disease. The study began in 1995, when 59,000 black women from all parts of the United States enrolled through postal questionnaires. The women provided demographic and health data on the 1995 baseline questionnaire, including information on weight, height, smoking, drinking, contraceptive use, use of other selected medications, illnesses, reproductive history, physical activity, diet, use of health care, and other factors. The participants are followed through biennial questionnaires to determine the occurrence of cancers and other illnesses and to update information on risk factors. Self-reports of cancer are confirmed through medical records and state cancer registry records. Mouthwash-swish samples, as a source of DNA, were obtained from ~26,000 BWHS participants in 2002-2007. DNA was isolated from the mouthwash-swish samples at the Boston University Molecular Core Genetics Laboratory using the QIAAMP DNA Mini Kit (Qiagen). All incident colorectal cancer cases with a DNA sample were included in the present analysis. Two controls per case, selected from among BWHS participants free of colorectal cancer at end of follow-up, were matched to cases on year of birth (+/- 2 years) and geographical region of residence (Northeast, South, Midwest, and West). A total 209 colorectal cancer cases and 423 controls were sent for genotyping. Campaign Against Cancer and Heart Disease (CLUE II): The Campaign Against Cancer and Heart Disease, is a prospective cohort designed to identify biomarkers and other factors associated with risk of cancer, heart disease, and other conditions (Kakourou et al., 2015) [PMID: 26220152]. 32,894 participants were recruited from May through October 1989 from Washington County, Maryland and surrounding communities. Colorectal cancer cases (n = 297) and matched controls (n = 296) were identified between 1989 and 2000 among participants in the CLUE II cohort of Washington County, Maryland. Colorectal Cancer Study of Austria (CORSA): In the ongoing colorectal cancer study of Austria (CORSA), more than 13,000 Caucasian participants have been recruited within the province-wide screening project "Burgenland Prevention Trial of Colorectal Disease with Immunological Testing" (B-PREDICT) since 2003 (Hofer et al., 2011) [PMID: 21422235]. All inhabitants of the Austrian province Burgenland aged between 40 and 80 years are annually invited to participate in fecal immunochemical testing and haemoccult positive screening participants are invited for colonoscopy. CORSA includes genomic DNA and plasma of colorectal cancer cases, low-risk and high-risk adenomas, and colonoscopy-negative controls. Controls received a complete colonoscopy and were free of colorectal cancer or polyps. CORSA participants have been recruited in the four KRAGES hospitals in Burgenland, Austria, and additionally, at the Medical University of Vienna (Department of Surgery), the Viennese hospitals "Rudolfstiftung" and the "Sozialmedizinisches Zentrum Sud", and at the Medical University of Graz (Department of Internal Medicine). 1403 colorectal cancer and advanced colorectal adenoma cases, and 1404 matched controls were selected for the study. Distribution of factors sex and age (5 year strata) were evenly matched between cases and controls. Cancer Prevention Study II (CPS II): The CPS II Nutrition cohort is a prospective study of cancer incidence and mortality in the United States, established in 1992 and described in detail elsewhere (Calle et al., 2002; Campbell et al., 2014) [PMID: 12015775; PMID: 25472679]. At enrollment, participants completed a mailed self-administered questionnaire including information on demographic, medical, diet, and lifestyle factors. Follow-up questionnaires to update exposure information and to ascertain newly diagnosed cancers were sent biennially starting in 1997. Reported cancers were verified through medical records, state cancer registry linkage, or death certificates. The Emory University Institutional Review Board approves all aspects of the CPS II Nutrition Cohort. A total of 360 cases and 359 controls were selected for this study. Czech Republic Colorectal Cancer Study (Czech Republic CCS): Cases with positive colonoscopy results for malignancy, confirmed by histology as colon or rectal carcinomas, were recruited between September 2003 and May 2012 in several oncological departments in the Czech Republic (Prague, Pilsen, Benesov, Brno, Liberec, Ples, Pribram, Usti and Labem, and Zlin). Two control groups, sampled at the same time of cases recruitment, were included in the study. The first group consisted of hospital-based individuals with a negative colonoscopy result for malignancy or idiopathic bowel diseases. The reasons for the colonoscopy were: i) positive fecal occult blood test, ii) hemorrhoids, iii) abdominal pain of unknown origin, and iv) macroscopic bleeding. The second control group consisted of healthy blood donor volunteers from a blood donor center in Prague. All individuals were subjected to standard examinations to verify the health status for blood donation and were cancer-free at the time of the sampling. Details of CRC cases and controls have been reported previously (Vymetalkova et al., 2014; Naccarati et al., 2016; Vymetalkova et al., 2016) [PMID: 24755277; PMID: 26735576; PMID: 27803053]. All subjects were informed and provided written consent to participate in the study. They approved the use of their biological samples for genetic analyses, according to the Declaration of Helsinki. The design of the study was approved by the Ethics Committee of the Institute of Experimental Medicine, Prague, Czech Republic. All subjects included in the study were Caucasians and comprised 1792 cases and 1764 matched controls. Controls were matched to CRC cases as 1:1 ratio. Matching was done on age and sex. Age was matched on +-5 years, whereas sex was matched exactly. For the cases without matched controls, matching was done only on sex. Early Detection Research Network (EDRN): The aim of the EDRN initiative is to develop and sustain a biorepository for support of translational research (Amin et al., 2010) [PMID: 21031013]. High-quality biospecimens were accrued and annotated with pertinent clinical, epidemiologic, molecular and genomic information. A user-friendly annotation tool and query tool was developed for this purpose. The various components of this annotation tool include: CDEs are developed from the College of American Pathologists (CAP) Cancer Checklists and North American Association of Central Cancer Registries (NAACR) standards. The CDEs provides semantic and syntactic interoperability of the data sets by describing them in the form of metadata or data descriptor. A total of 352 colorectal case samples and 399 controls were selected for this study. Controls were matched to CRC cases based on age and sex. The EPICOLON Consortium (EPICOLON): The EPICOLON Consortium comprises a prospective, multicentre and population-based epidemiology survey of the incidence and features of CRC in the Spanish population (Fernandez-Rozadilla et al., 2013) [PMID: 23350875]. Cases were selected as patients with de novo histologically confirmed diagnosis of colorectal adenocarcinoma. Patients with familial adenomatous polyposis, Lynch syndrome or inflammatory bowel disease-related CRC, and cases where patients or family refused to participate in the study were excluded. Hospital-based controls were recruited through the blood collection unit of each hospital, together with cases. All of the controls were confirmed to have no history of cancer or other neoplasm and no reported family history of CRC. Controls were randomly selected and matched with cases for hospital, sex and age (+- 5 years). A total of 370 cases and 370 controls were selected for genotyping. Hawaii Adenoma Study: For this adenoma study, two flexible-sigmoidoscopy screening clinics were first used to recruit participants on Oahu, Hawaii. Adenoma cases were identified either from the baseline examination at the Hawaii site of the Prostate Lung Colorectal and Ovarian cancer screening trial during 1996-2000 or at the Kaiser Permanente Hawaii's Gastroenterology Screening Clinic during 1995-2007. In addition, starting in 2002 and up to 2007, we also approached for recruitment all eligible patients who underwent a colonoscopy in the Kaiser Permanente Hawaii Gastroenterology Department. Cases were patients with histologically confirmed first-time adenoma(s) of the colorectum and were of Japanese, Caucasian or Hawaiian race/ethnicity. Controls were selected among patients with a normal colorectum and were individually matched to the cases on age at exam, sex, race/ethnicity, screening date (+-3 months) and clinic and type of examination (colonoscopy or flexible sigmoidoscopy). We recruited 1016 adenoma cases (67.8% of all eligible) and 1355 controls (69.2% of all eligible); 889 cases and 1169 controls agreed to give a blood and 29 cases and 34 controls, a mouthwash sample. A total of 989 cases and 1185 controls were genotyped for this study. Columbus-area HNPCC Study (HNPCC, OSUMC): Patients with colorectal adenocarcinoma diagnosed at six participating hospitals were eligible for this study, regardless of age at diagnosis or family history of cancer. Patients with a clinical diagnosis of familial adenomatous polyposis were not eligible for this study. These six hospitals perform the vast majority of all operations for CRC in the Columbus metropolitan area (population 1.7 million). The institutional review board at all participating hospitals approved the research protocol and consent form in accordance with assurances filed with and approved by the United States Department of Health and Human Services. Briefly, during the period of January 1999 through August 2004, 1,566 eligible patients with CRC were accrued to the study (Hampel et al., 2008) [PMID 18809606]. A total of 1472 colorectal cancer samples had enough blood DNA remaining to be sent for genotyping. Control samples were provided by the Ohio State University Medical Center%#39;s (OSUMC) Human Genetics Sample Bank. The Columbus Area Controls Sample Bank is a collection of control samples for use in human genetics research that includes both donors' anonymized biological specimens and linked phenotypic data. The data and samples are collected under the protocol "Collection and Storage of Controls for Genetics Research Studies", which is approved by the Biomedical Sciences Institutional Review Board at OSUMC. Recruitment takes place in OSUMC primary care and internal medicine clinics. If individuals agree to participate, they provide written informed consent, complete a questionnaire that includes demographic, medical and family history information, and donate a blood sample. 4-7 ml of blood is drawn into each of 3 ACD Solution A tubes and is used for genomic DNA extraction and the establishment of an EBV-transformed lymphoblastoid cell culture, cell pellet in Trizol, and plasma. Controls were matched to CRC cases as 1:1. Matching was done on age at reference time (age_ref), race, and sex. Age_ref was matched on +-5 years. Sex and race were matched exactly. For the cases without matched controls, matching was done only on sex and race with 1:1 ratio. Since controls are fewer than cases, one control is matched on 2 cases at most. Health Professionals Follow-up Study (HPFS): A parallel prospective study to the NHS (Nurses' Health Study). The HPFS cohort comprised 51,529 men aged 40-75 who, in 1986, responded to a mailed questionnaire (Rimm et al., 1990) [PMID: 2090285]. Participants provided information on health related exposures, including current and past smoking history, age, weight, height, diet, physical activity, aspirin use, and family history of colorectal cancer. Colorectal cancer and other outcomes were reported by participants or next-of-kin and were followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical record review. Information was abstracted on histology and primary location. Incident cases were defined as those occurring after the subject provided the blood sample. Prevalent cases were defined as those occurring after enrollment in the study but before the subject provided the blood sample. Follow-up evaluation has been excellent, with 94% of the men responding to date. Colorectal cancer cases were ascertained through January 1, 2008. In 1993-1995, 18,825 men in the HPFS mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 13,956 men in the HPFS who had not provided a blood sample previously mailed in a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1986, but before the subject provided either a blood or buccal sample. After excluding participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were previously constructed. In addition to colorectal cancer cases and controls, a set of adenoma cases and matched controls with available DNA from buffy coat were selected for genotyping. Over the follow-up period, data were collected on endoscopic screening practices and, if individuals had been diagnosed with a polyp, the polyps were confirmed to be adenomatous by medical record review. Adenoma cases were ascertained through January 1, 2008. A separate case-control set was constructed of participants diagnosed with advanced adenoma matched to control participants who underwent a lower endoscopy in the same time period and did not have an adenoma. Advanced adenoma was defined as an adenoma 1 cm or larger in diameter and/or with tubulovillous, villous, or highgrade dysplasia/carcinoma-in-situ histology. Matching criteria included year of birth (within 1 year) and month/ year of blood sampling (within 6 months), the reason for their lower endoscopy (screening, family history, or symptoms), and the time period of any prior endoscopy (within 2 years). Controls matched to cases with a distal adenoma either had a negative sigmoidoscopy or colonoscopy examination, and controls matched to cases with proximal adenoma all had a negative colonoscopy. In total, 159 advanced adenoma cases and 109 controls were selected for genotyping. Leeds Colorectal Cancer Study (LCCS): Following local ethical approval, colorectal cancer cases were recruited from 1997 until 2012 in Leeds, UK through surgical clinics. Initially, funding was provided by the UK Ministry of Agriculture, Farming and Fisheries (subsequently the Food Standards Agency) and Imperial Cancer Research Fund (subsequently Cancer Research UK). Recruitment also occurred similarly in Dundee, Perth and York between the periods of 1997 and 2001 using the same protocol and the data and samples were combined. Pathologically confirmed cases were consented at outpatient clinics, providing information on known and postulated risk factors for colorectal cancer (diet, lifestyle and family history) as well as providing a blood sample for DNA. Exclusion criteria included pre-existing diverticular disease and an inability to complete the questionnaire. The General Practitioners of cases (all UK residents have a nominated General Practitioner to whom to refer initial medical queries) and these GPs were asked to send letters to other persons on their patient list of the same gender and born within 5 years of the case. Subsequently to enhance the number of controls, we systematically invited patients from selected GP practices. Diet was assessed in cases and controls using an extensive dietary and lifestyle questionnaire modified by that produced by the European Prospective Investigation in Cancer (EPIC). The frequency that each specific food items were eaten was recorded and we also obtained average fruit and vegetable consumption as a cross-check. In total, 1591 cases and 739 controls provided a DNA sample. The North Carolina Colon Cancer Studies (NCCCS I/II): The North Carolina Colon Cancer Studies (NCCCS I- colon and NCCCS II-rectal) were population-based case-control studies conducted in 33 counties of North Carolina. Cases were identified using the rapid case ascertainment system of the North Carolina Central Cancer Registry. Patients with a first diagnosis of histologically confirmed invasive adenocarcinoma of the colon (cecum through sigmoid colon) between October 1996 and September 2000 were classified as potential cases in the NCCCS I. The NCCCS II included patients with a first diagnosis of histologically confirmed invasive adenocarcinoma of the sigmoid colon, rectosigmoid, or rectum (hereafter collectively referred to as rectal cancer) between May 2001 and September 2006. Additional eligibility requirements were: aged 40-80 years, residence in one of the 33 counties, ability to give informed consent and complete an interview, had a driver's license or identification card issued by the North Carolina Department of Motor Vehicles (if under the age of 65), and had no objections from the primary physician in regards to contacting the individual. Controls, identified and sampled during the respective study dates, were selected from two sources. Potential controls under the age of 65 were identified using the North Carolina Department of Motor Vehicles records. For those 65 years and older, records from the Center for Medicare and Medicaid Services were used. Controls were matched to cases using randomized recruitment strategies. Recruitment probabilities were done using strata of 5-year age, sex, and race groups. Dietary information was collected using a modified version of the semiquantitative food frequency questionnaire developed at the National Cancer Institute. In addition, participants were asked about vitamin and mineral supplementation, special diets, restaurant eating, sodium use, and fats used in cooking. In NCCCS I, 515 colorectal cases and 687 matched controls were sent for genotyping. In NCCCS II, 796 colorectal cases and 823 controls were sent from the NCCCS II for genotyping. Controls were matched to CRC cases as 1:1 ratio. Matching was done on age, race, and sex. Age was matched on +-5 years. Race and sex was matched exactly. For the cases without matched controls, matching was done only on sex and race. Nurses Health Study (NHS): The NHS cohort began in 1976 when 121,700 married female registered nurses age 30-55 years returned the initial questionnaire that ascertained a variety of important health-related exposures (Belanger et al., 1978) [PMID: 248266]. Since 1976, follow-up questionnaires have been mailed every 2 years. Colorectal cancer and other outcomes were reported by participants or next-of-kin and followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical-record review. Information was abstracted on histology and primary location. The rate of follow-up evaluation has been high: as a proportion of the total possible follow-up time, follow-up evaluation has been more than 92%. Colorectal cancer cases were ascertained through June 1, 2008. In 1989 -1990, 32,826 women in NHS I mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 29,684 women in NHS I who did not previously provide a blood sample mailed a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1976 but before the subject provided either a blood or buccal sample. After excluding participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were previously constructed from which DNA was isolated from either buffy coat or buccal cells for genotyping. In addition to colorectal cancer cases and controls, a set of advanced adenoma cases and matched controls with available DNA from buffy coat were selected for genotyping. Over the follow-up period, data were collected on endoscopic screening practices and, if individuals had been diagnosed with a polyp, the polyps were confirmed to be adenomatous by medical record review. Adenoma cases were ascertained through June 1, 2011. A separate case-control set was constructed of participants diagnosed with advanced adenoma matched to control participants who underwent a lower endoscopy in the same time period and did not have an adenoma. Advanced adenoma was defined as an adenoma more than 1 cm in diameter and/or with tubulovillous, villous, or high-grade dysplasia/carcinoma-in-situ histology. Matching criteria included year of birth (within 1 year) and month/year of blood sampling (within 6 months), the reason for their lower endoscopy (screening, family history, or symptoms), and the time period of any prior endoscopy (within 2 years). Controls matched to cases with a distal adenoma either had a negative sigmoidoscopy or colonoscopy examination, and controls matched to cases with proximal adenoma all had a negative colonoscopy. A total of 272 cases and 236 matched controls were sent to CIDR for the advanced adenoma case-control set. Northern Swedish Health and Disease Study (NSHDS): Comprises over 110,000 participants, including approximately one third with repeated sampling occasions, from three population-based cohorts (Dahlin et al., 2010; Myte et al., 2016) [PMID: 20197478; PMID: 27367522]. The largest is the ongoing Vasterbotten Intervention Programme, in which all residents of Vasterbotten County are invited to a health examination upon turning 30 (some years), 40, 50 and 60 years of age. Extensive measured and self-reported health and lifestyle data, as well as blood samples for central biobanking in Umea, Sweden, are collected at the health exam. Leucocyte DNA samples for 1:1-matched CRC case-control sets from the NSHDS, of which 878 samples are included in this study, have been selected for genotyping. This is in addition to 354 samples from the NSHDS previously analyzed as part of the multicenter EPIC cohort. Cancer-specific and overall survival data are available for all patients. For at least 425 patients, archival tumor tissue has been analyzed for the BRAF V600E mutation and by sequencing codon 12 and 13 for KRAS mutations, as well as for MSI screening status by immunohistochemistry and for an eight-gene CIMP panel using quantitative real-time PCR (MethyLight). Ohio Colorectal Cancer Prevention Initiative (OCCPI, OSUMC): OCCPI (ClinicalTrials.gov identifier: NCT01850654) is a population-based study of colorectal cancer patients diagnosed in one of 51 hospitals throughout the state of Ohio from January 1, 2013 through December 31, 2016. The OCCPI was created to decrease CRC incidence in Ohio by identifying patients with hereditary predisposition (statewide universal tumor screening for newly diagnosed CRC patients), increase colonoscopy compliance for first-degree relatives of CRC patients, and encourage future research through the creation of a biorepository. The 51 Ohio hospitals participating in the OCCPI were selected to represent a cross-section of clinical centers in the state based on high reported volume of CRC patients, affiliation with a high volume hospital, or interest in participation. Institutional Review Board (IRB) approval was obtained by the individual hospitals, Community Oncology Programs, or by ceding review to the OSU IRB. Written informed consent was obtained. A total of 2139 colorectal cases were genotyped. Patients were considered eligible for this study if they were age 18 or older at the time of enrollment, if they had a surgical resection (or biopsy if unresectable) in the state of Ohio demonstrating an adenocarcinoma of the colorectum from 1/1/13 - 12/31/16. Matched control samples were selected from the Ohio State University Medical Center's (OSUMC) Human Genetics Sample Bank in an identical way to the selection for the Columbus-area HNPCC Study (please refer to the description for the Columbus-area HNPCC Study). Prostate, Lung, Colorectal and Ovarian Cancer Screening Trail (PLCO): PLCO enrolled 154,934 participants (men and women, aged between 55 and 74 years) at ten centers into a large, randomized, two-arm trial to determine the effectiveness of screening to reduce cancer mortality. Sequential blood samples were collected from participants assigned to the screening arm. Participation was 93% at the baseline blood draw. In the observational (control) arm, buccal cells were collected via mail using the "swish-and-spit" protocol and participation rate was 65%. Details of this study have been previously described (Huang et al., 2016) [PMID: 27673363] and are available online (http://dcp.cancer.gov/plco). For this study 1651 advanced adenoma cases and 1392 controls were selected for genotyping. Selenium and Vitamin E Prevention Trial (SELECT): The Selenium and Vitamin E Cancer Prevention Trial (SELECT) was a double-blind, placebo controlled clinical trial which explored using selenium and vitamin E alone and in combination to prevent prostate cancer in healthy men (Lippman et al., 2009) [PMID: 19066370]. Secondary endpoints included the prevention of colorectal and lung cancers. SELECT was conducted at 427 sites and centers in the United States, Canada and Puerto Rico; 35,533 men 55 years and older (50 or older if African American) were randomized beginning August 22, 2001. Supplementation was discontinued on October 23, 2008 due to futility. 308 colorectal cancer cases and 308 matched controls were selected from the SELECT population and sent for genotyping. Screening Markers For Colorectal Disease Study and Colonoscopy and Health Study (SMS-REACH): Details on this study population were previously reported (Burnett-Hartman et al., 2014) [PMID: 24875374]. Participants were enrollees in an integrated health-care delivery system in western Washington State (Group Health Cooperative, Seattle, Washington) aged 24-79 years who underwent an index colonoscopy for any indication between 1998 and 2007 and donated a buccal-cell or blood sample for genotyping analysis. Study recruitment took place in 2 phases, with phase 1 occurring in 1998-2003 and phase 2 occurring in 2004-2007. Persons who had undergone a colonoscopy less than 1 year prior to the index colonoscopy, persons with inadequate bowel preparation for the index colonoscopy, and persons with a prior or new diagnosis of colorectal cancer, a familial colorectal cancer syndrome (such as familial adenomatous polyposis), or another colorectal disease were ineligible. Patients diagnosed with adenomas or serrated polyps and persons who were polyp-free at the index colonoscopy (controls) were systematically recruited during both phases of recruitment. Approximately 75% agreed to participate and provided written informed consent. Based on medical records, persons who agreed to participate and those who refused study participation were similar with respect to age, sex, and colorectal polyp status. Study protocols were approved by the institutional review boards of the Group Health Cooperative and the Fred Hutchinson Cancer Research Center (Seattle, Washington). A total of 575 cases and 508 matched were selected for the study. Controls were matched to CRC cases as 1:1 ratio. Matching was done on age_ref, race, and sex. Age_ref was matched on +-5 years. The Women's Health Initiative (WHI): WHI is a long-term national health study that has focused on strategies for preventing heart disease, breast and colorectal cancer, and osteoporotic fractures in postmenopausal women. The original WHI study included 161,808 postmenopausal women enrolled between 1993 and 1998. The Fred Hutchinson Cancer Research Center in Seattle, WA serves as the WHI Clinical Coordinating Center for data collection, management, and analysis of the WHI. The WHI has two major parts: a partial factorial randomized Clinical Trial (CT) and an Observational Study (OS); both were conducted at 40 Clinical Centers nationwide. The CT enrolled 68,132 postmenopausal women between the ages of 50-79 into trials testing three prevention strategies. If eligible, women could choose to enroll in one, two, or all three of the trial components. The components are: Hormone Therapy Trials (HT): This double-blind component examined the effects of combined hormones or estrogen alone on the prevention of coronary heart disease and osteoporotic fractures, and associated risk for breast cancer. Women participating in this component with an intact uterus were randomized to estrogen plus progestin (conjugated equine estrogens [CEE], 0.625 mg/d plus medroxyprogesterone acetate [MPA] 2.5 mg/d] or a matching placebo. Women with prior hysterectomy were randomized to CEE or placebo. Both trials were stopped early, in July 2002 and March 2004, respectively, based on adverse effects. All HT participants continued to be followed without intervention until close-out. Dietary Modification Trial (DM): The Dietary Modification component evaluated the effect of a low-fat and high fruit, vegetable and grain diet on the prevention of breast and colorectal cancers and coronary heart disease. Study participants were randomized to either their usual eating pattern or a low-fat dietary pattern. Calcium/Vitamin D Trial (CaD): This double-blind component began 1 to 2 years after a woman joined one or both of the other clinical trial components. It evaluated the effect of calcium and vitamin D supplementation on the prevention of osteoporotic fractures and colorectal cancer. Women in this component were randomized to calcium (1000 mg/d) and vitamin D (400 IU/d) supplements or a matching placebo. The Observational Study (OS)examines the relationship between lifestyle, environmental, medical and molecular risk factors and specific measures of health or disease outcomes. This component involves tracking the medical history and health habits of 93,676 women not participating in the CT. Recruitment for the observational study was completed in 1998 and participants were followed annually for 8 to 12 years. All centrally confirmed cases of invasive colorectal cancers, or deaths from colorectal cancer were selected as potential cases from September 30, 2015 database. Controls were participants free of colorectal cancer (invasive or in situ) as of September 30, 2015. Potential cases and controls were excluded if they (1) were non-White; (2) had history of colorectal cancers at baseline; (3) lost to follow-up after enrollment; (4) DbGAP ineligible; (5) had <1.25ug of DNA; (6) selected for WHI study M26 Phase I or II; (7) selected for WHI study AS224 and also included in the imputation project. A total of 578 cases and 104,429 controls met the eligibility criteria. Each case was matched with 1 control (1:1) that exactly met the following matching criteria: age (+-5 years), 40 randomization centers (exact), WHI date (+-3 years), CaD date (+-3 years), OS flag (exact), HRT assignments (exact), DM assignments (exact), and CaD assignments (exact). Control selection was done in a time-forward manner, selecting one control for each case from the risk set at the time of the case's event. The matching algorithm was allowed to select the closest match based on a criteria to minimize an overall distance measure (Bergstralh EJ, Kosanke JL. Computerized matching of cases to controls. Technical Report #56, Department of Health Sciences Research, Mayo Clinic, Rochester MN. April 1995). Each matching factor was given the same weight. When exact matches could not be found, the matching criteria were gradually relaxed among unmatched cases and controls until all cases had found matched controls. Using the matching criteria specified above, 559 of the 578 eligible cases found exact matches. The matching criteria was then relaxed to : Age+-5, randomization centers, WHI date +- 3 years, CaD date +- 3 years, OS flag, HRT flag, DM flag, CaD flag. 17 of the remaining 19 unmatched cases found matched controls. By matching on Age+-5, randomization centers, WHI date +- 3 years, CaD date +- 3 years, OS flag, HRT flag, the remaining 2 unmatched cases found their matches.