This dataset contains raw FASTQ files from single cell RNA sequencing of three patient-derived pancreatic ductal adenocarcinoma (PDAC) organoid lines (P28, P40, P47) cultured under standard conditions. For each line, single cells were FACS sorted into 384-well capture plates, with each well containing a 50nl droplet of barcoded primers. Plates were processed following an adapted SORT-seq protocol, and cDNA libraries were generated using CEL-Seq2 with TruSeq small RNA primers (Illumina). Sequencing was performed on an Illumina NextSeq500 platform using paired-end reads (read 1: 26 cycles, index read: 6 cycles, read 2: 60 cycles). Data are controlled-access and intended for single cell transcriptomic analysis of PDAC organoid heterogeneity and WNT pathway activity.
Projects Jointly managed by the European Bioinformatics Institute (EMBL-EBI) in Cambridge (UK) and the Centre for Genomic Regulation (CRG) in Barcelona, the EGA provides an invaluable service to the worldwide biomedical research community. The teams leading the EGA are involved in several international partnerships and consortia in numerous scientific fields, where they contribute to ambitious projects. In addition to the project listed below, The EGA is in a long-standing partnership with the Global Alliance for Genomics and Health (GA4GH), as described on the dedicated page. On-going projects Project Duration Domain Funder Tags EASIGEN-DS | The EASIGEN-DS project aims to conduct a design study to establish a new European Research Infrastructure on Advanced Genomics Technologies, EASIGEN. To develop an excellent scientific, technological and operational design, we will conduct landscape studies, stakeholder consultations, and community surveying. 2025-2028 Genomic and health data Horizon Europe DATA MANAGEMENT DOCUMENTATION INFRASTRUCTURE Go-IMPaCT | Go-IMPaCT will contribute sequenced genomes and provide infrastructure as part of IMPaCT-Cohort, one of the three fundamental pillars of the Precision Medicine Infrastructure associated with Science and Technology (IMPaCT) program in Spain. Along with the Genome of Europe (GoE) project, around 18.000 people will have their genomes sequenced, also contributing to Spain's commitments in 1+MG. Go-IMPaCT will fund the development of an EGA node to manage and share this genomic and phenoclinic data, laying the foundations for regional and ethnic genomic variability in Spain to be available for research purposes. The IMPaCT cohort is created with the spirit of being an open research tool, compatible with the rest of the health research ecosystem, and other international initiatives. 2025-2027 Large-scale genomics and health data; personalised medicine Instituto de Salud Carlos III ACCESS DISCOVERY INFRASTRUCTURE METADATA STANDARDS FAIR-FEGA | This project seeks to accelerate data depositions into FEGA, significantly increasing the data flow in and from FEGA nodes. It will build capacity within the FEGA nodes and increase awareness in a wide range of stakeholders, thus altogether achieving the ultimate goal of enhancing data reuse. The project will be carried out by a strategic consortium comprising seven ELIXIR nodes and two ELIXIR communities. 2025-2026 Not applicable ELIXIR ACCESS DISCOVERY DOCUMENTATION INFRASTRUCTURE METADATA STANDARDS FEGA-Connect | A consortium of six ELIXIR nodes plus the Polish FEGA node (in-kind contribution) joining forces to build a solid base to develop solutions for effective multi-omic sensitive data integration between FEGA nodes and other infrastructures and specialised Data repositories. We aim to promote a more coherent data deposition, discoverability and retrieval of multi-omics datasets, providing FAIRer data and consequently accelerating research. 2025-2026 Multi-omics data ELIXIR ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE METADATA STANDARDS IMPaCT-Data 2 | IMPaCT-Data 2 will develop a digital platform for the integration and modelling of biomedical data associated with IMPaCT (Precision Medicine Infrastructure associated with Science and Technology) projects in Spain. It will deploy a sustainable infrastructure that facilitates the integration, standardisation, interoperability and analysis of clinical, genomic, molecular and medical imaging data. This platform will be aligned with European projects such as Genome of Europe (GoE), the first project to make use of the European Genomic Data Infrastructure (GDI), and EUCAIM. IMPaCT-Data 2 will benefit from advanced Artificial Intelligence and High Computing Capacity Systems capabilities, offering robust and accessible tools for researchers from the National Health System in Spain. 2025-2026 Large-scale genomics and health data; personalised medicine Instituto de Salud Carlos III ACCESS DISCOVERY INFRASTRUCTURE METADATA STANDARDS ERDERA | The European Rare Disease Research Alliance (ERDERA) takes over EJPRD to deliver concrete health benefits to rare disease patients in the next decade by advancing prevention, diagnosis and treatment research. To leave no one behind, over 170 organisations championed by the European Union and member states are working hand in hand to make Europe a world leader in rare diseases research and innovation. 2024-2034 Rare diseases Horizon Europe; "La Caixa" Foundation cofunds CRG's contribution ACCESS DATA ANALYSIS DISCOVERY INFRASTRUCTURE GoE | The Genome of Europe initiative aims to build a European network of national genomic reference cohorts of at least 500.000 citizens. These reference cohorts will be selected to be representative of the European population. 2024-2028 Large-scale genomic and health data Horizon Europe ACCESS DISCOVERY INFRASTRUCTURE METADATA STANDARDS HEREDITARY | HEREDITARY aims to transform the way we approach disease detection, prepare treatment response, and explore medical knowledge by building a robust, interoperable, trustworthy, and secure framework that integrates multimodal health data (including genetic data) while ensuring compliance with cross-national privacy-preserving policies. 2024-2027 Neurodegenerative disorders, gut-brain interplay Horizon Europe DATA MANAGEMENT DATA ANALYSIS EOSC-ENTRUST | The mission of EOSC-ENTRUST is to create a European network of trusted research environments for sensitive data and to drive European interoperability by joint development of a common blueprint for federated data access and analysis. 2024-2026 Trusted Research Environment Horizon Europe INFRASTRUCTURE EBV-MS | "Targeting Epstein-Barr Virus Infection for Treatment and Prevention of Multiple Sclerosis". The ambitious goals of the project are to answer the questions why only a few EBV infected persons develop MS, and define the underlying mechanism of this process, as well as clarify if targeting the EBV infection can prevent MS or improve the disease course. 2023-2028 Viral-host genetics; immune response; disease modelling; Disease prevention Horizon Europe DATA MANAGEMENT DATA ANALYSIS WISDOM | WELL-BEING IMPROVEMENT THROUGH THE INTEGRATION OF HEALTHCARE AND RESEARCH DATA AND MODELS WITHOUT BORDER FOR CHRONIC IMMUNE-MEDIATED DISEASES aims to deploy novel approaches for data processing, harmonisation, management, and secure data sharing and federated access for diseases like multiple sclerosis. Using an end-user guided approach, it will facilitate responsible and critical assessment of the use of AI in healthcare. 2023-2028 Chronic immune-mediated diseases Horizon Europe DATA MANAGEMENT INFRASTRUCTURE EUCAIM | EUropean Federation for CAncer IMages is a project that will build a highly secure, federated and large-scale European cancer imaging platform, with capabilities that will greatly enhance the potential of Artificial Intelligence in oncology. 2023-2027 Cancer Digital Europe Programme (DIGITAL) DISCOVERY CONTAGIO | CONTAGIO (COhorts Network To be Activated Globally In Outbreaks) aims to create coordination mechanisms to rapidly react to infectious disease (re-)emergence in low- and middle-income countries (LMICs). 2023-2026 Infectious Diseases European Commission - Horizon Europe ACCESS DATA MANAGEMENT DISCOVERY Youth-GEMs | Youth-GEMS (Gene Environment Interactions in Mental Health TrajectorieS of Youth) will conduct research into the genetic and environmental factors of mental health in young European people. 2022-2027 Mental health European Commission - Horizon Europe DATA MANAGEMENT DISCOVERY GDI | The European Genomics Data Infrastructure project is enabling access to genomic and related phenotypic and clinical data across Europe. It is doing this by establishing a federated, sustainable and secure infrastructure to access the data. 2022-2026 Genomic and health data European Commission - Horizon Europe; "La Caixa" Foundation cofunds CRG's contribution DISCOVERY DOCUMENTATION INFRASTRUCTURE EOSC4Cancer | EOSC4Cancer builds on existing projects, research outcomes and established community solutions to create the federated FAIR data, analysis and services infrastructure needed for European Cancer research programmes. 2022-2025 Cancer European Commission - Horizon Europe DISCOVERY IMPaCT-T2D | The IMPaCT-T2D project aims at studying the complete genomes of a large cohort of patients with Type 2 Diabetes mellitus (T2D), using modern sequencing technologies and artificial intelligence (AI) in order to improve the stratification and pharmacological treatment in the context of precision medicine. 2022-2025 Cardiovascular and Complex Diseases Spanish Ministry of Science and Innovation; Instituto de Salud Carlos III ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE EuCanImage | A European Cancer Image Platform Linked to Biological and Health Data for Next-Generation Artificial Intelligence and Precision Medicine in Oncology. 2020-2025 AI Solutions in Oncology European Commission - H2020 Programme; "La Caixa" Foundation cofunds CRG's contribution DATA MANAGEMENT METADATA STANDARDS GenoMed4ALL | A consortium built to empower personalised medicine in the field of haematological diseases through the use of AI and the pooling of genomic and clinical data. 2020-2025 Hematological diseases European Commission - H2020 Programme DISCOVERY METADATA STANDARDS Completed projects Project Duration Domain Funder Tags BY-COVID | The BeYond-COVID project aims to make COVID-19 data accessible to scientists in laboratories but also to anyone who can use it, such as medical staff in hospitals or government officials. Going beyond SARS-CoV-2 data, the project will provide a framework for making data from other infectious diseases open and accessible to everyone. 2021-2024 Infectious diseases European Commission - H2020 Programme ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE IMPaCT-Data | IMPaCT-Data aims to create the infrastructure for secondary use of data from Spanish healthcare systems - electronic health records, medical imaging and genomic repositories - and contribute with the knowledge and methodology produced to the healthcare system. 2021-2024 Large-scale genomics and health dataSpanish Ministry of Science and Innovation; Instituto de Salud Carlos III ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE LaMarató | It is a project aimed at creating and developing a catalan interhospitalary network to interrogate genetic variants from thousands of genetic tests carried out in patients with rare diseases from the main catalan hospitals. 2021-2024 Genomic and health data Fundació La Marató de TV3 (catalan foundation) DISCOVERY HealthyCloud | This consortium will contribute a Strategic Agenda towards the European Health Research and Innovation Cloud. The project will work in collaboration with a broad range of stakeholders to ensure that all voices are included and that the results are technically and ethically sound. 2021-2023 Not Applicable European Commission - H2020 Programme DOCUMENTATION B1MG | Beyond 1 Million Genomes aims to create a network of genetic and clinical data across Europe. The project provides coordination and support to the 1+ Million Genomes Initiative (1+MG). This initiative is a commitment of 24 EU countries, the UK and Norway to give cross-border access to one million sequenced genomes by 2022. 2020-2023 Not applicable European Commission - Horizon Europe DATA MANAGEMENT INFRASTRUCTURE METADATA STANDARDS ELIXIR-CONVERGE | An alliance with the goal of Connecting and aligning ELIXIR Nodes to deliver sustainable FAIR life-science data management services. 2020-2023 Data Management and Infectious Diseases European Commission - H2020 Programme DATA MANAGEMENT INFRASTRUCTURE METADATA STANDARDS IHCC | The International HundredK+ Cohorts Consortium aims to create a global platform for translational research – informing the biological and genetic basis for disease and improving clinical care and population health. 2020-2022 Translational research NIH; The Wellcome Trust; CZI INFRASTRUCTURE METADATA STANDARDS PPCG | The Pan Prostate Cancer Group aims to harmonise and interrogate Whole Genome DNA Sequence data generated around the world from over 2000 men with prostate cancer, with associated transcriptome and methylome data to include men from different clinical categories, and ethnicities. This project is about providing breakthrough advances through analysis of a very large series of Whole Genome DNA data from prostate cancer contributed by many of the leading scientists and clinicians working in prostate cancer genomics. 2019-2024 Cancer Cancer Research UK DATA MANAGEMENT CINECA | Consortium providing a Federated solution enabling population-scale genomic and biomolecular data accessible across international borders accelerating research and improving the health of individuals resident across continents. 2019-2023 Large-scale Genomics and Health Data European Commission - H2020 Programme ACCESS DATA MANAGEMENT DISCOVERY INFRASTRUCTURE EASI-Genomics | A project designed to provide easy access to cutting-edge DNA sequencing technologies to researchers from academia and industry, within a framework that ensures compliance with ethical and legal requirements, as well as FAIR and secure data management. 2019-2023 Next Generation Sequencing European Commission - H2020 Programme ACCESS EJP-RD | An European consortium built to create a comprehensive, sustainable ecosystem allowing a virtuous circle between research, care, and medical innovation. 2019-2023 Rare diseases European Commission - H2020 Programme ACCESS DATA MANAGEMENT DOCUMENTATION METADATA STANDARDS EOSC-Life | EOSC-Life brings together the 13 Life Science research infrastructures (LS RIs) to create an open, digital and collaborative space for biological and medical research. The project will publish 'FAIR' data and a catalogue of services provided by participating RIs for the management, storage and reuse of data in the European Open Science Cloud (EOSC). 2019-2023 Not applicable European Commission - H2020 Programme DOCUMENTATION EUCANCan | A federated network aiming at implementing a cultural, technological and legal integrated framework across Europe and Canada, to enable and facilitate the efficient sharing of cancer genomic data. 2019-2023 Cancer European Commission - H2020 Programme DATA MANAGEMENT METADATA STANDARDS The Federated EGA framework: supporting sensitive data management across the ELIXIR Nodes | This project is a direct continuation of the FHD IS with the goal to position the FEGA framework as the core infrastructure driver to support human data sharing for research. 2019-2023 Human genomic data ELIXIR INFRASTRUCTURE UK Biobank | UK Biobank is a large-scale biomedical database and research resource, containing in-depth genetic and health information from half a million UK participants. This project is to archive whole genome sequencing and other genetic data for UK Biobank participants. 2019-2023 Large-scale Genomics and Health Data The Wellcome Trust; UKRI; Amgen; AstraZeneca; GSK; Johnson & Johnson DATA MANAGEMENT INFRASTRUCTURE VEIS | The core mission of VEIS is to create an open ecosystem of technologies that will address and adapt to the requirements of the systems used to analyse and interpret -omics and clinical data in research and application environments in biomedicine. The aim of the project is to leverage the value of the EGA for both industry and society. 2019-2022 Oncology and Rare diseases Generalitat de Catalunya and European Regional Development Fund (ERDF) ACCESS DISCOVERY ELIXIR BEACON IS | This study follows on from a number of earlier activities that have established the ELIXIR Beacon Project. The main aim is to extend the Beacon protocol, developed at EGA, to become the reference ELIXIR Data Discovery product 2019-2021 Not applicable ELIXIR DISCOVERY ELIXIR FHD IS | This project coordinates the delivery of FAIR compliant metadata standards, interfaces, and reference implementation to support the federated ELIXIR network of human data resources. 2019-2021 Human genomic data ELIXIR INFRASTRUCTURE ELIXIR Rare Disease | The Rare Disease Community extends and generalises the system of access authorisation and high volume secure data transfer developed within the EGA. The goal of the Community is to create a federated infrastructure that will enable researchers to discover, access and analyse different rare disease repositories across Europe. It is doing this in partnership with other European infrastructure projects, namely RD-CONNECT, BBMRI-ERIC and E-Rare.2019-2021 Rare diseases ELIXIR INFRASTRUCTURE Solve-RD | Solve-RD - solving the unsolved rare diseases - is a research project funded by the European Commission. It echoes the ambitious goals set out by the International Rare Diseases Research Consortium (IRDiRC) to deliver diagnostic tests for most rare diseases by 2020. The current diagnostic and subsequent therapeutic management of rare diseases is still highly unsatisfactory for a large proportion of rare disease patients - the unsolved RD cases. For these unsolved rare diseases, we are unable to explain the etiology responsible for the disease phenotype, predict the individual disease risk and/or rate of disease progression, and/or quantitate the risk of relatives to develop the same disorder. 2018-2024 Rare diseases European Commission - H2020 Programme ACCESS DATA MANAGEMENT METADATA STANDARDS EuCanShare | An EU-Canada joint infrastructure for next-generation multi-Study Heart research. 2018-2022 Cardiovascular Diseases European Commission - H2020 Programme ACCESS METADATA STANDARDS
This study is multi-omics study of a Asian longitudinal metastatic breast cancer (MBC) cohort treated with palbociclib plus endocrine therapy. It contains NGS of baseline (BL) and progressive disease (PD) from 70 patients, consisting of 79 tumor/normal matched whole exome sequencing (WES) from 62 patients and 90 tumor whole transcriptome sequecing samples (WTS) from 70 patients. There were 56 BL biopsies profiled by WES and 64 by WTS; 23 PD biopsies were profiled by WES and 26 by WTS. Twenty and 23 patients had paired BL and PD biopsies profiled by WES and WTS, respectively.
The ELLIPSE Consortium is an international effort to discover risk loci for prostate cancer. It includes the meta-analysis of existing GWAS data as well as novel GWAS, exome, and iCOGS genotyping. The GWAS meta-analysis includes the following cases and controls from studies of European ancestry: UK GWAS stage 1 (Illumina Infinium HumanHap 550 Array: 1854 cases and 1894 controls), UK GWAS stage 2 (Illumina iSELECT: 3706 cases and 3884 controls), CAPS1 (Affymetrix GeneChip 500K: 474 cases and 482 controls), CAPS2 (Affymetrix GeneChip 5.0K: 1458 cases and 512 controls), BPC3 (Illumina Human610 Illumina: 2068 cases and 3011 controls), PEGASUS (HumanOmni2.5: 4600 cases and 2941 controls). The OMNI 2.5M genotyping was conducted for 977 prostate cancer cases from UKGPCS. The Exome SNP array genotyping was conducted for 4741 subjects from UKGPCS. The iCOGs genotyping was conducted for 10366 subjects which includes the Multiethnic Cohort (n=1648) and UKGPCS (n=8718). Below is a description of each study that contributed to the meta-analysis of men of European ancestry. Information about the studies that contributed to the multiethnic meta-analysis can be found on the associated study page and also in Conti et al (Nature Genetics, PMID:33398198). UK GWAS Stage 1 (UK1) and Stage 2 (UK2): The UK Genetic Prostate Cancer Study (UKGPCS) was first established in 1993 and is the largest prostate cancer study of its kind in the UK, involving nearly 189 hospitals. We are based at The Institute of Cancer Research in Sutton, Surrey, and collaborate with the Royal Marsden NHS Foundation Trust. Our aim is to find genetic changes which are associated with prostate cancer risk. Our target is to recruit 26,000 gentlemen into the UKGPCS by 2017. Men are eligible to take part if they fit into at least one of the following groups: They have been diagnosed with prostate cancer at 60 years of age or under (up to their 61st birthday). They have been diagnosed with prostate cancer and a first, second or third degree relative where at least one of these men were diagnosed with prostate cancer at 65 years of age or under. They are affected and have 3 or more cases of prostate cancer on one side of their family. They are a prostate cancer patient at the Royal Marsden NHS Foundation Trust. We have to date recruited around 16,000 men on whom we have germline DNA and clinical data at diagnosis. The UK GWAS is based on genotyping of 541,129 SNPs in 1,854 individuals with clinically detected (non-PSA-screened) prostate cancer (cases) and 1,894 controls. 43,671 SNPs showing strong evidence of association in stage 1 were followed up by genotyping a further 3,268 cases and 3,366 controls from UK and Melbourne in stage2. CAPS1 and CAPS2: The CAPS (Cancer of the Prostate in Sweden) study represents a large Swedish population-based cancer study, comprising 3,161 cases and 2,149 controls, recruited between 2001 and 2003. Biopsy confirmed prostate cancer cases were identified and recruited from four out of six regional cancer registries in Sweden, diagnosed between July 2001 and October 2003. Clinical data including TNM stage, Gleason grade and PSA levels at time for diagnosis were retrieved through record linkage to the National Prostate Cancer Registry. Control subjects, who were recruited concurrently with case subjects, were randomly selected from the Swedish Population Registry and matched according to the expected age distribution of cases (groups of 5-year intervals) and geographic region. Whole blood was collected from all individuals for extraction of genomic DNA. A GWAS was conducted in two parts. In the first phase (CAPS1) 498 cases and 502 controls were genotyped, in the second phase 1,483 cases and 519 controls were genotyped. Genotyping was performed using the GeneChip Human Mapping 500K (CAPS1) and 5.0K (CAPS2) Array Set from Affymetrix (Santa Clara, CA). The National Cancer Institute Breast and Prostate Cancer Cohort Consortium, BPC3: BPC3 was a consortium of prospective cohort studies investigating genetic and gene-environmental risk factors for breast and prostate cancer. Each study selected cases and controls for this study as described below. The clinical criteria defining advanced prostate cancer (Gleason = 8 or stage C/D) were either obtained from medical records or cancer registries. The Gleason score source was either surgical specimens (radical prostatectomy or autopsy) or the diagnostic biopsy (needle biopsy or TURP). When multiple Gleason scores were available the surgical value was used. PLCO was removed from the analysis as the samples were included in the Pegasus GWAS described below. In total 2,473 advanced prostate cancer cases and 3,534 controls were included in the analysis following QC. ATBC, Alpha-Tocopherol, Beta-Carotene Cancer Prevention Study: ATBC was a randomized, placebo-controlled primary prevention trial to investigate whether α-tocopherol or ß-carotene supplementation reduced the incidence of lung or other cancers in male smokers. Between 1985 and 1988, 29,133 men ages 50 to 69 years were enrolled in the trial from Finland and randomized to supplementation (50 mg α-tocopherol, 20mg ß-carotene, or both) or placebo. Men with a prior history of cancer, other than non-melanoma skin cancer or carcinoma in situ, were excluded from participating. Incident cancer cases are identified through linkage with the Finnish Cancer Registry, which has ~100% ascertainment of cancer cases nationwide. Cases included 249 men diagnosed with advanced prostate cancer (Gleason = 8 or stage C/D) from 1985 to 2003 with DNA available. Controls were 1,271 men selected previously for a GWAS of lung cancer in ATBC without a diagnosis of prostate cancer. CPSII, Cancer Prevention Study II: CPSII is a cohort study started in 1982 to investigate the relationship between dietary, lifestyle and other etiologic factors and cancer mortality. Approximately 1.2 million men and women enrolled in the study from 50 states in the U.S. In 1992, a subset of these participants (n= ~184,000) were enrolled in the CPSII Nutrition Cohort to examine the relationship between dietary and other exposures and cancer incidence. Blood samples were drawn from approximately 39,376 members of the Nutritional Cohort from 1998 to 2001, and buccal cells were collected from 69,467 members from 2001 to 2002. Cancer cases are identified by self-report through follow-up questionnaires followed by verification through medical records and/or linkage to state cancer registries as well as death certificates. A total of 660 advanced prostate cancer cases (Gleason = 8 or stage III/IV) with a source of DNA were identified for this study. Controls were 660 men matched on ethnicity, date of birth, sample collection date and DNA type. EPIC, European Prospective Investigation into Cancer and Nutrition: EPIC is a prospective study designed to investigate both genetic and non-genetic risk factors for different forms of cancer. Study participants were almost all white Europeans. Approximately 500,000 individuals (150,000 men) in EPIC were recruited between 1992 and 2000, from 23 centers in 10 European countries. Overall approximately 400,000 subjects also provided a blood sample at recruitment. The methods of recruitment and details of the study design are described in detail elsewhere. In brief, study participants completed an extensive questionnaire on both dietary and nondietary data at recruitment. The present study includes subjects from advanced prostate cancer cases (Gleason = 8 or stage III/IV) matched to controls based on study center, length of follow-up, age at enrollment (± 6 months), fasting and time of day of blood collection (± 1 hour). The advanced prostate cancer subjects were from 8 of the 10 participating countries: Denmark, Germany, Greece, Italy, the Netherlands, Spain, Sweden and the United Kingdom (UK). France and Norway were not included in the current study because these cohorts only included female subjects. All participants gave written consent for the research and approval for the study was obtained from the ethical review board from all local institutions in the regions where participants had been recruited for the EPIC study. HPFS, Health Professionals Follow-up Study: HPFS began in 1986 and is an ongoing prospective cohort study of 51,529 United States male dentists, optometrists, osteopaths, podiatrists, pharmacists, and veterinarians 40 to 75 years of age. The baseline questionnaire provided information on age, marital status, height and weight, ancestry, medications, smoking history, disease history, physical activity, and diet. At baseline the cohort was 97% white, 2% Asian American, and 1% African American. The median follow-up through 2005 was 10.5 years (range 2-19 years). Self-reported prostate cancer diagnoses were confirmed by obtaining medical and/or pathology records. Prostate cancer deaths are either reported by family members in response to follow-up questionnaires, discovered by the postal system, or the National Death Index. Questionnaires are sent every two years to surviving men to update exposure and medical history. In 1993 and 1994, a blood specimen was collected from 18,018 men without a prior diagnosis of cancer. Prostate cancer cases are matched to controls on birth year (+/-1) and ethnicity. Controls are selected from those who are cancer-free at the time of the case’s diagnosis, and had a prostate-specific antigen test after the date of blood draw. MEC, Multiethnic Cohort: The Multiethnic Cohort Study is a population-based prospective cohort study that was initiated between 1993 and 1996 and includes subjects from various ethnic groups - African Americans and Latinos primarily from Californian (great Los Angeles area) and Native Hawaiians, Japanese-Americans, and European Americans primarily from Hawaii. State drivers’ license files were the primary sources used to identify study subjects in Hawaii and California. Additionally, in Hawaii, state voter’s registration files were used, and, in California, Health Care Financing Administration (HCFA) files were used to identify additional African American men. All participants (n=215,251) returned a 26-page self-administered baseline questionnaire that obtained general demographic, medical and risk factor information. In the cohort, incident cancer cases are identified annually through cohort linkage to population-based cancer Surveillance, Epidemiology, and End Results (SEER) registries in Hawaii and Los Angeles County as well as to the California State cancer registry. Information on stage and grade of disease are also obtained through the SEER registries. Blood sample collection in the MEC began in 1994 and targeted incident prostate cancer cases and a random sample of study participants to serve as controls for genetic analyses. PHS, Physicians Health Study:PHS was a randomized trial of aspirin and ß carotene for cardiovascular disease and cancer among 22,071 U.S. male physicians ages 40-84 years at randomization; none had a cancer diagnosis at baseline. The original trial ended, but the men are followed. From 1982 to 1984, blood samples were collected from 14,916 physicians before randomization. Participants are sent yearly questionnaires to ascertain endpoints. Whenever a physician reports cancer, we request permission to obtain the medical records, and cancers are confirmed by pathology report. We obtain death certificates and pertinent medical records for all deaths. Follow-up for nonfatal outcomes in PHS is over 97% complete, and for mortality, over 99%. PLCO, Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial:PLCO is a multicenter, randomized trial to evaluate screening methods for the early detection of prostate, lung, colorectal and ovarian cancer. Between 1993 and 2001, over 150,000 men and women ages 55-74 years were recruited from ten centers in the United States (Birmingham, AL; Denver, CO; Detroit, MI; Honolulu, HI; Marshfield, WI; Minneapolis, MN; Pittsburgh, PA; Salt Lake City, UT; St. Louis, MO; and Washington, D.C.). Men randomized to the screening arm underwent prostate cancer screening with prostate-specific antigen (PSA) annually for six years and digital rectal exam annually for four years. Blood specimens were collected from participants randomized to the screening arm of the trial, and buccal cell specimens were obtained from participants randomized to the control arm. Cases included 754 men diagnosed with advanced prostate cancer (Gleason = 8 or stage III/IV) from either arm of the trial. Of these cases, 317 were genotyped previously as part of Cancer Genetic Markers of Susceptibility (CGEMS), a GWAS for prostate cancer. Controls included 1,491 men without a diagnosis of prostate cancer from the screening arm of the PLCO trial. All subjects provided informed consent to participate in genetic etiology studies of cancer and other traits. This study was approved by the institutional review boards at the ten centers and the National Cancer Institute. PLCO was removed from the meta-analysis of the BPC3 studies as a consequence of PEGASUS below. PEGASUS, Prostate cancer Genome-wide Association Study of Uncommon Susceptibility loci: Pegasus is a genome-wide association nested within the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. PLCO is a multicenter, randomized trial to evaluate screening methods for the early detection of prostate, lung, colorectal and ovarian cancer. Between 1993 and 2001, over 150,000 men and women ages 55-74 years were recruited from ten centers in the United States (Birmingham, AL; Denver, CO; Detroit, MI; Honolulu, HI; Marshfield, WI; Minneapolis, MN; Pittsburgh, PA; Salt Lake City, UT; St. Louis, MO; and Washington, D.C.). Men randomized to the screening arm underwent prostate cancer screening with prostate-specific antigen annually for six years and digital rectal exam annually for four years. Blood specimens were collected from participants randomized to the screening arm of the trial, and buccal cell specimens were obtained from participants randomized to the control arm. Cases included 4,598 men of European ancestry diagnosed with prostate cancer from either arm of the trial and controls included 2,941 men of European ancestry without a diagnosis of cancer from the screening arm, matched on age and year of randomization. All subjects provided informed consent, and the study approved by the institutional review board at the National Cancer Institute. Funding:This work was supported by the GAME-ON U19 initiative for prostate cancer (ELLIPSE): U19 CA148537. The BPC3 was supported by the U.S. National Institutes of Health, National Cancer Institute (cooperative agreements U01-CA98233, U01-CA98710, U01-CA98216, and U01-CA98758, and Intramural Research Program of NIH/National Cancer Institute, Division of Cancer Epidemiology and Genetics). The ATBC study and PEGASUS was supported in part by the Intramural Research Program of the NIH and the National Cancer Institute. Additionally, this research was supported by U.S. Public Health Service contracts N01-CN-45165, N01-RC-45035, N01-RC-37004 and HHSN261201000006C from the National Cancer Institute, Department of Health and Human Services. CAPS: The Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden was supported by the Cancer Risk Prediction Center (CRisP; www.crispcenter.org), a Linneus Centre (Contract ID 70867902) financed by the Swedish Research Council, Swedish Research Council (grant: K2010-70X-20430-04-3), the Swedish Cancer Foundation (grant: 09-0677), the Hedlund Foundation, the Söderberg Foundation, the Enqvist Foundation, ALF funds from the Stockholm County Council. Stiftelsen Johanna Hagstrand och Sigfrid Linnér’s Minne, Karlsson’s Fund for urological and surgical research. We thank and acknowledge all of the participants in the Stockholm-1 study. We thank Carin Cavalli-Björkman and Ami Rönnberg Karlsson for their dedicated work in the collection of data. Michael Broms is acknowledged for his skillful work with the databases. KI Biobank is acknowledged for handling the samples and for DNA extraction. Hans Wallinder at Aleris Medilab and Sven Gustafsson at Karolinska University Laboratory are thanked for their good cooperation in providing historical laboratory results. UKGPCS would like to acknowledge the NCRN nurses and Consultants for their work in the UKGPCS study. We thank all the patients who took part in this study. This work was supported by Cancer Research UK (grants: C5047/A7357, C1287/A10118, C1287/A5260, C5047/A3354, C5047/A10692, C16913/A6135 and C16913/A6835). We would also like to thank the following for funding support: Prostate Research Campaign UK (now Prostate Cancer UK), The Institute of Cancer Research and The Everyman Campaign, The National Cancer Research Network UK, The National Cancer Research Institute (NCRI) UK. We are grateful for support of NIHR funding to the NIHR Biomedical Research Centre at The Institute of Cancer Research and The Royal Marsden NHS Foundation Trust. The MEC was supported by NIH grants CA63464, CA54281 and CA098758.
Human populations in Africa are the most genetically diverse in the world, having a rich history, and yet are severely understudied. The purpose of this study is to understand the population history of four ethnolinguistic groups in northern Kenya - the Turkana, Samburu, Rendille, and Borana. This study aims to measure the extent of variation among several genetic systems in these populations, with the goal to better understand the broader population history in northern Kenya and to add to the current knowledge of population genetics in Africa.
Broadly, the objectives of the study are to characterize genetic variation in any gene or genomic region in South African populations, in the first place HIV. The purpose is to apply this knowledge to establish assays and gene signatures to test in disease-association studies, and to study underlying mechanisms of disease causation/severity.
This dataset contains a gene-cell matrix derived from single-cell RNA sequencing (scRNA-seq) data of ileal tissue from Crohn's disease (CD) patients and colorectal cancer (CRC) patients. It includes: Crohn's Disease Patients: A trio of transmural lesions (stenotic, inflamed, and non-inflamed) from each patient. Colorectal Cancer Patients: Unaffected ileal tissue used as external non-inflamed control. Cell Level Metadata: The dataset includes relevant cell-level metadata such as cell type annotations used in the study. Experimental Details: Platform: 10x Genomics Chromium Single Cell 3' GEX Sequencing: Illumina NovaSeq Processing: Data processed with Cell Ranger software. Resulting count matrices were merged for downstream analysis, including integration and dimensionality reduction. Dataset Composition: Crohn's Disease Patients: 10 patients with 3 samples each (non-inflamed, inflamed, stenotic), totaling 30 samples. Colorectal Cancer Patients: 5 patients with 1 sample each of unaffected tissue, totaling 5 samples. Data Provided: Merged Raw Count Matrix: The final merged raw count matrix used for downstream analysis. Cell Metadata File: Contains details of sample, tissue, and patient for each cell in the count matrix. Barcodes File: Indicate each cell barcode which also encodes the sample, tissue, and patient details for each cell. CD.S_Inf: Stenotic Corhn's disease inflamed samples CD.S_Sten: Stenotic CD patient stenosis sample CD.S_Prox: Stenotic CD Patient - proximal non-inflamed sample CC.C_Prox: CRC Patient proximal unaffected sample eg: A barcode 'CC.C_1_Prox_AAGTCGTAGACCCTTA' indicates CRC Patient unaffected proximal sampe from CRC Patient no.1 and the nucleic acid sequence indicate a unique cell from this sample. Total Samples: Crohn's Disease (CD) Patients: 30 samples Colorectal Cancer (CRC) Patients: 5 samples Patient_no Sample Sample_type 1 CC.C_1 CC.C_1_Prox CC.C_Prox 2 CD.S_1 CD.S_1_Prox CD.S_Prox 3 CD.S_1 CD.S_1_Infl CD.S_Infl 4 CD.S_1 CD.S_1_Sten CD.S_Sten 5 CC.C_2 CC.C_2_Prox CC.C_Prox 6 CD.S_2 CD.S_2_Prox CD.S_Prox 7 CD.S_2 CD.S_2_Infl CD.S_Infl 8 CD.S_2 CD.S_2_Sten CD.S_Sten 9 CC.C_3 CC.C_3_Prox CC.C_Prox 10 CC.C_4 CC.C_4_Prox CC.C_Prox 11 CD.S_3 CD.S_3_Prox CD.S_Prox 12 CD.S_3 CD.S_3_Infl CD.S_Infl 13 CD.S_3 CD.S_3_Sten CD.S_Sten 14 CD.S_4 CD.S_4_Prox CD.S_Prox 15 CD.S_4 CD.S_4_Infl CD.S_Infl 16 CD.S_4 CD.S_4_Sten CD.S_Sten 17 CC.C_5 CC.C_5_Prox CC.C_Prox 18 CD.S_5 CD.S_5_Prox CD.S_Prox 19 CD.S_5 CD.S_5_Infl CD.S_Infl 20 CD.S_5 CD.S_5_Sten CD.S_Sten 21 CD.S_6 CD.S_6_Prox CD.S_Prox 22 CD.S_6 CD.S_6_Infl CD.S_Infl 23 CD.S_6 CD.S_6_Sten CD.S_Sten 24 CD.S_7 CD.S_7_Prox CD.S_Prox 25 CD.S_7 CD.S_7_Infl CD.S_Infl 26 CD.S_7 CD.S_7_Sten CD.S_Sten 27 CD.S_8 CD.S_8_Prox CD.S_Prox 28 CD.S_8 CD.S_8_Infl CD.S_Infl 29 CD.S_8 CD.S_8_Sten CD.S_Sten 30 CD.S_9 CD.S_9_Prox CD.S_Prox 31 CD.S_9 CD.S_9_Infl CD.S_Infl 32 CD.S_9 CD.S_9_Sten CD.S_Sten 33 CD.S_10 CD.S_10_Prox CD.S_Prox 34 CD.S_10 CD.S_10_Infl CD.S_Infl 35 CD.S_10 CD.S_10_Sten CD.S_Sten
The purpose of this study is to provide a reference profile of small extracellular RNAs in body fluids. These samples were originally obtained in a study that had a different purpose. The purpose of the original study was to collect information on the changes in cerebrospinal fluid (CSF) related to HIV infection, including the viral burden (amount of virus) and the body responses of the infected individual, such as the number and types of lymphocytes in the CSF. In addition to looking at the number and types of lymphocytes in the CSF, this original study also examined the presence of markers of inflammation and other chemical changes, and the relationship of those changes to nervous system dysfunction in AIDS.
In addition to Exome-sequencing, we used RNA-sequencing to also characterize resistance to lirafugratinib with whole exome sequencing in patients with FGFR2-driven cancers, treated in the phase 1/2 ReFocus trial (NCT04526106) and enrolled in the UNLOCK program at Gustave Roussy
Human populations in Africa are the most genetically diverse in the world, having a rich history, and yet are severely understudied. The purpose of this study is to understand the population history of the Turkana. This study aims to measure the extent of genetic variation using whole genome and whole exome sequencing in this population, with the goal to better understand the broader evolutionary history of these populations and to add to the current knowledge of population genetics in Africa.