The study was conducted under the auspices of the Transdisciplinary Research In Cancer of the Lung (TRICL) Research Team, which is a part of the Genetic Associations and MEchanisms in ONcology (GAME-ON) consortium, and associated with the International Lung Cancer Consortium (ILCCO). Ethics All participants provided written informed consent. All studies were reviewed and approved by institutional ethics review committees at the involved institutions. Sequencing data are derived from four sub-studies. The sub-studies that contributed include Harvard, Liverpool, Toronto, and IARC. The IARC and Toronto studies are described above. A description of the Harvard and Liverpool studies is provided below. Liverpool Lung Project: The Liverpool Lung Project (LLP)1 is a case control and cohort study, which has over 11,500 individuals, with detailed epidemiological, clinical and outcome data with associated specimens (i.e. tumour tissue, blood, plasma, sputum, bronchial lavage, EBUS and oral brushings). The participants have completed a detailed lifestyle questionnaire and updated data on clinical outcome and hospital events are collected through the Office of National Statistics, Cancer Registry and from Health Episode Statistics. The project is registered on the UK National Institute for Health Research (NIHR) lung cancer portfolio and has all the required ethical approvals and sponsorship arrangements in place. The LLP has detailed standard operating procedures (SOP) for all aspects of the recruitment, data, specimen collection as well as the data storage. The LLP Cohort study has 8,224 participants with blood and 7,761 with plasma samples. The LLP case-control samples have been incorporated into in a large number of international GWAS and molecular studies 2,3, methylation 4-7, microRNA 8and next generation studies 9-11, resulting in high ranking publications, as well as forming the basis for the LLP risk prediction model 12-14 which has been utilised in the UK lung cancer screening trial (UKLS) 15-17 Patient and control DNAs were derived from EDTA-venous blood samples. Harvard Samples. David Christiani at the Harvard University School of Public Health has been directing research studies to investigate etiological factors influencing lung cancer development since 1983 and has amassed a collection of 2000 controls and 5055 lung cancer cases. He has been actively collecting and storing snap frozen tumor samples since 1992. Around 1500 tumor samples have been collected and the average wet tumor yield is about 30 grams of tumor, of which 631 cases have completely annotated clinical and survival information. Pathology confirmation is provided by two pathologists. At the time of surgery, a minimum of 30 grams of wet lung tumor tissue and 30 grams of non-involved tissue from the same lobe is sectioned, flash frozen and sent to Dr. Christiani's lab for logging and storage. A blood sample for DNA and serum is collected. A structured interview by trained research staff is conducted on each case, and clinical outcomes and treatments is extracted and entered into the molecular epidemiology data base at Harvard. Fresh frozen samples have been collected from 1451 lung cancer and are available for study. Samples from this collaborative study have played key roles in major studies, including the initial finding describing EGFR mutations in lung cancer 22. Participants in this study are patients, > 18 years of age, with newly diagnosed histologically confirmed lung cancer. Samples that are included in the analysis have the following histologies: Adenocarcinoma: 8140/3, 8250/3, 8260/3, 8310/3, 8480/3 8560/3; LCC: 8012/3, 8031/3; squamous carcinoma: 8070/3, 8071/3, 8072/3, 8074/3; and other NSCLC: 8010/3, 8020/3, 8021/3, 8032/3, 8230/3. The Toronto Study: The Toronto study was conducted in the Great Toronto Area between 1997 and 2014. Cases were recruited at the hospitals in the network of University of Toronto and Lunenfeld- Tanenbaum Research Institute. At the time of recruitment in the clinical setting, provisional diagnoses of lung carcinoma were first assigned based on clinical criteria. Diagnoses for all cases included were histologically confirmed by the reference pathologist who is a specialist in pulmonary pathology, based on review of pathology reports from surgery, biopsy or cytology samples in 100% of cases. Diagnostic classification was done initially according to ICD-9, ICD-10, and ICD for oncology-2, and subsequently converted to ICD-O-3. Tumors were grouped into the major categories included in this analysis according to primary cancer type based on the ICD-3 definitions. Controls were randomly selected from individual visiting family medicine clinics and Ministry of Finance Municipal Tax Tapes. All subjects were interviewed using a standard questionnaire and information on lifestyle risk factors, occupational history, medical and family history was collected. Blood samples were collected from more than 85% of the subjects. IARC: The IARC data are derived from case-control studies conducted in Russia and include samples that have available tissue samples. Patient and control DNAs were derived from EDTA-venous blood samples. The lung cancer patients were classified according to ICD-O-3; SQ: 8070/3, 8071/3, 8072/3, 8074/3; AD: 8140/3, 8250/3, 8260/3, 8310/3, 8480/3, 8560/3, 8251/3, 8490/3, 8570/3, 8574/3; with tumous with overlapping histologies classified as mixed. The Lung Cancer Transdisciplinary Research Cohort is utilized in the following dbGaP sub-studies. To view genotypes, other molecular data, and derived variables collected in these sub-studies, please click on the following sub-studies below or in the "Sub-studies" section of this top-level study page phs000876 Lung Cancer Transdisciplinary Research Cohort. phs000877 Meta Analysis phs000878 CIDR Lung Cancer phs001681 Affy Axiom Array
The dataset contains reconstructed VDJ sequences (fasta files) and accompanying metadata for each cell (csv file) from scRNA-seq data generated with the Smart-seq2 protocol. The VDJ sequences were reconstructed with the computational tool BraCeR using raw fastq files as input. The dataset contains sequences from 355 IgA+ peripheral blood B-lineage cells of two untreated celiac disease patients. The sequences comprise both IgA+ transglutaminase 2-specific and other IgA+ B cells.
69 tissue samples from various parts of the developing human embryo brain were dissociated and single cells were collected and processed without bias for mRNA-seq using 10X chromium 3' protocol. Libraries were sequenced on Illumina NovaSeq and reads aligned against the human GRCh38 genome. This is an addition to EGAD00001006049 Human development single cell sequencing, consisting of samples added after the original submission to EGA, but included in final paper.
scRNA-seq data from circulating gamma delta T cells derived from healthy donors and patients with stage IV melanoma that either responded or did not respond to anti-PD-1 monotherapy or anti-PD-1 and anti-CTLA-4 combination therapy. This dataset includes paired samples from these patients before and 3 months after the start of immunotherapy.
These documents contain the patch seq data presented in the Bouwen et al Nat Comm 2025. Human brain samples were gathered during surgery, neurons were isolated using patch clamp glass pipette. Isolated cells were processed using Smartseq2 pipeline, and sequenced using an Illumina 2500. Files are formatted in fastq format. There are a total of 12 patient samples, each with 2 replication runs.
Samples were sequenced with NovaSeq6000 and the sequencing files were processed using the RSSnextflow workflow (https://gitlab.com/b8307038/rssnextflow). The raw read counts from all batches were combined. Analysis-ready batch-corrected read counts were generated using ComBat-seq. Read counts normalisation for DGE analysis was performed with limma-voom function (log2 CPM) as seen in the pipeline.
scRNA-seq analysis of human iPSC-derived microglia in a novel, human in vitro 3D cortical tissue model. Samples include three distinct conditions (each as biological duplicates): wild-type microglia extracted from cultures after 1 month (WT_1mo) and 3 months (WT_3mo) of culturing, as well as knock-in microglia extracted from cultures carrying three familial AD mutations in the APP gene (Swedish, Iberian, Arctic; introduced by CRISPR/Cas9-mediated knock-in) after 3 months of culturing (KI_3mo).
Complete reference epigenome (as defined by IHEC) of a lung epithelial cell line with non-small Cell Lung Adenocarcinoma
Fastq files of ATAC-seq data of induced pluripotent stem cells (iPSC), definitive endoderm (DE), hepatocyte-like cells (HLC) and primary human hepatocytes (PHH). The dataset comprises data from two different in vitro differentiation protocols: Cellartis (Takara Bio, "CEL", n = 4) and as described by Wang et al. (PMID: 28287600, "HAY", n = 1), as well as from 3 PHH donors.
Single-cell profiling of sero-negative and sero-positive humans that were inoculated with SARS-CoV-2. The cellular response during SARS-CoV-2 is profiled using single-cell transcriptomics, CITE-seq and single cell immune profiling, by sampling PBMCs and nasal swabs before and at multiple time points during SARS-CoV-2 infection. This one-of-a-kind cellular map will give unique temporal resolution of how nasal and immune cells respond to SARS-CoV-2 exposure and infection.