This dataset comprises 124 human faecal samples collected across Indonesia between 2015 and 2021, together with 6 controls. It was collected to study patterns of human microbiome variation across Indonesia’s diverse geography and lifestyles, and to generate the Indonesian Microbiome Ecology and Evolution (IndoMEE) metagenome-assembled genome (MAG) reference database. The dataset consists of 250 bp paired-end metagenomic reads from 124 faecal samples acquired from 116 individuals, at ca. 10.6 Gb / sample, together with 2 positive controls (ZymoBIOMICS Microbial Community DNA Standard by Zymo Research Cat# D6305), and 4 negative controls (for sampling: Blank, for extraction: Buffer EB by Qiagen Cat# 19086, for library prep: Nuclease Free Water by Ambion-Invitrogen Cat# AM9937). Library preparation was carried out using Illumina's TruSeq DNA Nano DNA kit, and sequencing was performed on the Illumina NovaSeq 6000 platform.
Whole genome and transcriptome sequencing of cancer of unknown primary tumours was used to determine yield of clinical biomarkers for a molecular guided trial or for resolving cancer type of origin. This study includes profiling of germline DNA and tumour DNA and RNA by whole genome and transcriptome sequencing. All samples are in BAM format aligned with GRCh38 reference genome. This dataset includes: 1. Whole genome sequencing (WGS) of 78 cancer of unknown primary tumour samples and 73 matched germline DNA. 2. Whole transcriptome sequencing (WTS) of 69 cancer of unknown primary tumour samples (matched to WGS cases) 3. Whole genome sequencing of 22 cell-free DNA samples from cancer of unknown primary patients and matched Germaine DNA(8 samples matched to tumour WGS)
The blood samples of eight lung cancer patients and one benign lung tumor patient are collected for this dataset. Blood samples were centrifuged first at 1,600 × g for 10 minutes, and then the plasma was transferred into new micro tubes and centrifuged at 16,000 × g for another 10 minutes. The plasma was collected and stored at -80⁰C. CfDNA was extracted from 5 ml plasma using the Qiagen QIAamp Circulating Nucleic Acids Kit and quantified by Qubit 3.0 Fluoromter (Thermo Fisher Scientific). Bisulfite conversion of cfDNA was performed by using EZ-DNA-Methylation-GOLD kit (Zymo Research). After that, Accel-NGS Methy-Seq DNA library kit (Swift Bioscience) was used to prepare the sequencing libraries. The DNA libraries were then sequenced with 150bp paired-end reads.
The IYDP dataset includes BAM files of 126 Y chromosomes extracted from whole genome sequences. These are from individuals from a broad range of Indonesian islands - communities close to mainland Asia through to New Guinea. The original whole genome sequencing libraries were prepared using TruSeq DNA PCR-Free and TruSeq Nano DNA HT kits depending on DNA quantity. 150 bp paired-end sequencing was performed on the Illumina HiSeq X sequencer. Individuals were sequenced to expected mean depth of 30x, with an achieved median depth of raw reads across samples of 43x.
This dataset contains: 1.) Whole-genome sequencing (WGS) data (~6x) of 259 cfDNA samples obtained from 50 colorectal cancer (CRC) patients and 61 healthy controls. Paired-end sequencing was performed with 2x101 bp reads on the NovaSeq 6000 system. Data is provided as mapped .bam files (aligned to GRCh38/hg38). 2.) WGS data (~1x) of 50 tumor biopsy and 45 saliva samples from CRC patients. Paired-end sequencing was performed with 2x101 bp reads on the NovaSeq 6000 system. Data is provided as mapped .bam files (aligned to GRCh38/hg38).
Key objective Are the prognostic transcriptomic G1/G2 gene expression signature, MYC overexpression, and MYC amplification replicable stratifying biomarkers for future clinical trials in high-grade osteosarcoma? Knowledge gathered In an unselected cohort, the G2 gene expression signature and MYC overexpression, but not MYC amplification, were independently associated with poor event-free and overall survival. Relevance Transcriptomic biomarkers may serve as stratifying factors that guide the management of patients with high-grade osteosarcoma. Current data underlines the importance of prospective validation of the G1/G2 signature and MYC overexpression in an international, multicenter, study.
NOTICE OF CHANGE IN LOCATION FOR ALZHEIMER'S DISEASE SEQUENCING PROJECT (ADSP) GENETIC AND PHENOTYPIC DATA: ADSP whole exome and whole genome sequence data that are shared through dbGaP were mapped to the Genome Reference Consortium human genome GRCh37 (build 37). These data are from the Discovery Phase of the project (described below) and will continue to be available at this site. Please see the ADSP Design page for the complete study description. All data that are mapped to GRCh38 (hg38) are being shared through the NIA Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) Data Sharing Service (DSS). For instructions on how access the ADSP Build 38 data that are shared through NIAGADS DSS, visit the Application Instructions page. STUDY DESCRIPTION FOR dbGaP BUILD 37 ADSP DATA: The overarching goals of the Alzheimer's Disease Sequencing Project (ADSP) are to: (1) identify new genomic variants contributing to increased risk of developing Alzheimer's Disease (AD), (2) identify new genomic variants contributing to protection against developing AD, and (3) provide insight as to why individuals with known risk factor variants escape from developing AD. These factors will be studied in multi-ethnic populations in order to identify new pathways for disease prevention. Such a study of human genomic variation and its relationship to health and disease requires examination of a large number of study participants and needs to capture information about common and rare variants (both single nucleotide and copy number) in well phenotyped individuals. Using existing samples from NIH funded and other studies, three NHGRI funded Large Scale Sequencing and Analysis Centers (LSAC) - Broad, Baylor, and Washington University - produced the DNA sequence data. Variant call data are being made available to the scientific community through NIH-approved data repositories. Statistical analysis of the sequence data is anticipated to identify new genetic risk and protective factors. The ADSP will conduct and facilitate analysis of sequence data to extend previous discoveries that may ultimately result in new directions for AD therapeutics. Analysis of ADSP data will be done in two phases. The Discovery Phase analysis (2014-2018) is funded under PAR-12-183. The entire Discovery dataset contains whole-genome sequencing data on 584 subjects from 113 families, and pedigree data for > 4000 subjects; whole exome sequencing data on 5096 cases 4965 controls; and whole exome sequence data on an additional 853 (682 Cases [510 Non-Hispanic, 172 Hispanic]), and 171 Hispanic Control subjects from families that are multiply affected with AD. The Replication Phase (2016-2021) analysis will be funded under RFA-AG-16-001 and RFA-AG-16-002 and is expected to include a combination of genotyping and sequencing approaches on at least 30,000 subjects. Targeted sequencing will be done by the LSACs. GRCh37 Data Releases The first ADSP data release occurred on November 25, 2013. It included the whole-genome sequencing data in BAM file format on 410 individuals. The second ADSP data release occurred on March 31, 2014, and included the whole-genome sequencing data in BAM file format for an additional 168 individuals. The third ADSP data release occurred on November 03, 2014 and included whole-exome sequencing data in BAM file format for 10,939 individuals. The fourth ADSP data release occurred on February 13, 2015 and included revised ethnic data for subjects with whole-exome sequencing data. The fifth ADSP data release occurred on July 13, 2015 and included whole-genome genotypes and updated phenotypes as well as changes to pedigree structures and sample IDs. The sixth ADSP data release occurred on December 8, 2015, and included whole-exome genotypes and updated phenotypes as well as changes to subject IDs. This seventh ADSP data release on April 12, 2016 includes: (1) WES and WGS SNV VCF files (2) WES and WGS Indel PLINK files ADSP Data Available through dbGaP: ADSP - Whole Genome Sequencing ADSP - Whole Exome Sequencing Comments DNA-Seq (BAM) n=578 n=10913 Sequence data available (plus n=38 replications w/out genotype data) Concordant SNV Genotypes (PLINK format) N/A n=10913 QC'ed genotypes that are concordant between the Atlas (Baylor's) and GATK (Broad's) calling pipelines (a subset of the consensus genotype set) Consensus Genotypes (PLINK and VCF format) n=578 n=10913 QC'ed genotypes that are concordant between Atlas and GATK pipelines as well as those that that were called uniquely by Atlas or GATK Concordant Indel Genotypes (PLINK format) n=578 n=10913 QC'ed genotypes that are concordant between the Atlas and GATK calling pipelines Phenotype Data n=4735 n=10913 Data of n=53 phenotype variables available (plus administrative data), including APOE genotype. WGS phenotypes include data of connecting family members. Please use the release notes provided by dbGaP to obtain detailed information about study release updates. The ADSP data portal provides a customized interface for users to quickly identify and retrieve files by covariates, phenotypes, and data properties such as sequencing facility or coverage. For more information about the ADSP study and the data portal, please visit https://www.niagads.org/adsp/.
The Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) is a collaborative effort comprised of a coordinating center and scientific researchers from well-characterized cohort and case-control studies. This international consortium aims to accelerate the discovery of common and rare genetic risk variants for colorectal cancer by conducting large-scale meta-analyses of existing and newly generated genome-wide association study (GWAS) data, whole genome sequencing, replicating and fine-mapping of genetic discoveries, and investigating how genetic risk variants are modified by environmental risk factors. To expand these efforts, we assembled case-control sets or nested case-control sets from 6 different North American or European studies. Summary descriptions and study participant inclusions/exclusion criteria for each of these studies are detailed below. Cancer Prevention Study II (CPS II): The CPS II Nutrition cohort is a prospective study of cancer incidence and mortality in the United States, established in 1992 and described in detail elsewhere (Calle et al., 2002 PMID:12015775; Campbell et al., 2014 PMID:25472679). At enrollment, participants completed a mailed self-administered questionnaire including information on demographic, medical, diet, and lifestyle factors. Follow-up questionnaires to update exposure information and to ascertain newly diagnosed cancers were sent biennially starting in 1997. Reported cancers were verified through medical records, state cancer registry linkage, or death certificates. The Emory University Institutional Review Board approves all aspects of the CPS II Nutrition Cohort. We restricted to samples that had blood DNA source. Controls were matched to cases in a case/control ratio of 2:1 on reference year and sex. Darmkrebs: Chancen der Verhütung durch Screening (DACHS): This German study was initiated as a large population-based case-control study in 2003 in the Rhine-Neckar-Odenwald region (southwest region of Germany) to assess the potential of endoscopic screening for reduction of colorectal cancer risk and to investigate etiologic determinants of disease, particularly lifestyle/environmental factors and genetic factors. Cases with a first diagnosis of invasive colorectal cancer (International Classification of Diseases 10 codes C18-C20) who were at least 30 years of age (no upper age limit), German speaking, a resident in the study region, and mentally and physically able to participate in a one-hour interview, were recruited by their treating physicians either in the hospital a few days after surgery, or by mail after discharge from the hospital. Cases were confirmed based on histologic reports and hospital discharge letters following diagnosis of colorectal cancer. All hospitals treating colorectal cancer patients in the study region participated. Based on estimates from population-based cancer registries, more than 50% of all potentially eligible patients with incident colorectal cancer in the study region were included. Community-based controls were randomly selected from population registries, employing frequency matching with respect to age (5-year groups), sex, and county of residence. Controls with a history of colorectal cancer were excluded. Controls were contacted by mail and follow-up calls. The participation rate was 51%. During an in-person interview, data were collected on demographics, medical history, family history of CRC, and various life-style factors, as were blood and mouthwash samples. Routine formalin-fixed, paraffin-embedded (FFPE) tumor samples from the patients enrolled were requested from the pathology institutes and used for tumor tissue analyses. This analysis includes participants with blood source DNA that were recruited up to 2010 in this ongoing study. Controls were matched to cases on reference age and sex in a case/control ratio of 2:1. Health Professionals Follow-up Study (HPFS): A parallel prospective study to the NHS (Nurses' Health Study). The HPFS cohort comprised 51,529 men aged 40-75 who, in 1986, responded to a mailed questionnaire (Rimm et al., 1990 PMID:2090285). Participants provided information on health related exposures, including current and past smoking history, age, weight, height, diet, physical activity, aspirin use, and family history of colorectal cancer. Colorectal cancer and other outcomes were reported by participants or next-of-kin and were followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical record review. Information was abstracted on histology and primary location. Incident cases were defined as those occurring after the subject provided the blood sample. Prevalent cases were defined as those occurring after enrollment in the study but before the subject provided the blood sample. Follow-up evaluation has been excellent, with 94% of the men responding to date. Colorectal cancer cases were ascertained through January 1, 2008. In 1993-1995, 18,825 men in the HPFS mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 13,956 men in the HPFS who had not provided a blood sample previously mailed in a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1986, but before the subject provided either a blood or buccal sample. Participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were excluded. Control participants were required to be free of invasive colorectal cancer and non-invasive (stage 0 in situ) colorectal cancer. For this study, only European ancestry participants with blood source DNA and incident colorectal cancer cases were eligible for selection. Since enrollment year and sex matched exactly, controls were randomly selected in a case/control ratio of 2:1. Nurses Health Study (NHS): The NHS cohort began in 1976 when 121,700 married female registered nurses age 30-55 years returned the initial questionnaire that ascertained a variety of important health-related exposures (Belanger et al., 1978 PMID:248266). Since 1976, follow-up questionnaires have been mailed every 2 years. Colorectal cancer and other outcomes were reported by participants or next-of-kin and followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical-record review. Information was abstracted on histology and primary location. The rate of follow-up evaluation has been high: as a proportion of the total possible follow-up time, follow-up evaluation has been more than 92%. Colorectal cancer cases were ascertained through June 1, 2008. In 1989-1990, 32,826 women in NHS I mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 29,684 women in NHS I who did not previously provide a blood sample mailed a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1976 but before the subject provided either a blood or buccal sample. Participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were excluded. For this study, only European ancestry participants with blood source DNA and incident colorectal cancer cases were eligible for selection. Since enrollment year and sex matched exactly, controls were randomly selected in a case/control ratio of 2:1. Prostate, Lung, Colorectal and Ovarian Cancer Screening Trail (PLCO): PLCO enrolled 154,934 participants (men and women, aged between 55 and 74 years) at ten centers into a large, randomized, two-arm trial to determine the effectiveness of screening to reduce cancer mortality. Sequential blood samples were collected from participants assigned to the screening arm. Participation was 93% at the baseline blood draw. White colorectal cancer cases with a family history of colorectal cancer (no history of ulcerative colitis, Crohn's Disease, diverticulitis, Gardner's syndrome, Familial Polyposis) and successful genotyping from previous Peters GWAS were selected for this project. Controls were matched to cases on reference age and sex in a case/control ratio of 2:1. Women's Health Initiative (WHI): WHI is a long-term national health study that has focused on strategies for preventing heart disease, breast and colorectal cancer, and osteoporotic fractures in postmenopausal women. The original WHI study included 161,808 postmenopausal women enrolled between 1993 and 1998. The Fred Hutchinson Cancer Research Center in Seattle, WA serves as the WHI Clinical Coordinating Center for data collection, management, and analysis of the WHI. The WHI has two major parts: a partial factorial randomized Clinical Trial (CT) and an Observational Study (OS); both were conducted at 40 Clinical Centers nationwide. The CT enrolled 68,132 postmenopausal women between the ages of 50-79 into trials testing three prevention strategies. If eligible, women could choose to enroll in one, two, or all three of the trial components. The components are: Hormone Therapy Trials (HT): This double-blind component examined the effects of combined hormones or estrogen alone on the prevention of coronary heart disease and osteoporotic fractures, and associated risk for breast cancer. Women participating in this component with an intact uterus were randomized to estrogen plus progestin (conjugated equine estrogens [CEE], 0.625 mg/d plus medroxyprogesterone acetate [MPA] 2.5 mg/d) or a matching placebo. Women with prior hysterectomy were randomized to CEE or placebo. Both trials were stopped early, in July 2002 and March 2004, respectively, based on adverse effects. All HT participants continued to be followed without intervention until close-out. Dietary Modification Trial (DM): The Dietary Modification component evaluated the effect of a low-fat and high fruit, vegetable and grain diet on the prevention of breast and colorectal cancers and coronary heart disease. Study participants were randomized to either their usual eating pattern or a low-fat dietary pattern. Calcium/Vitamin D Trial (CaD): This double-blind component began 1 to 2 years after a woman joined one or both of the other clinical trial components. It evaluated the effect of calcium and vitamin D supplementation on the prevention of osteoporotic fractures and colorectal cancer. Women in this component were randomized to calcium (1000 mg/d) and vitamin D (400 IU/d) supplements or a matching placebo. The Observational Study (OS) examines the relationship between lifestyle, environmental, medical and molecular risk factors and specific measures of health or disease outcomes. This component involves tracking the medical history and health habits of 93,676 women not participating in the CT. Recruitment for the observational study was completed in 1998 and participants were followed annually for 8 to 12 years. All centrally confirmed White cases of invasive colorectal cancer, or death from colorectal cancer were selected as potential cases from the March, 2011 database. Case priory lists are: 1) have positive family history of colorectal cancer; 2) randomly select cases until we get a total of n=800 cases. Control participants were required to be White, free of invasive colorectal cancer and non-invasive (stage 0 in situ) colorectal cancer. Centrally denied cases of colorectal cancer were not allowed into the control pool. Case and control participants were subject to the following exclusion criteria: (1) had prior history of colorectal cancer at baseline; (2) had no available DNA (DNA searching as Nov 15, 2012); (3) cannot be deposited to dbGaP; (4) lost to follow-up after enrollment; (5) selected for WHI study M26 Phase II. Controls were matched to cases in a case/control ratio of 2:1. In order to get 2 cases with 1 control, cases were grouped by enrollment year (a total of 5 groups). For each year group, around 50% cases were selected to match controls. In total, 401 cases were selected to match controls. Matching was done on enrollment year, which was matched exactly. For additional information, see dbGaP: phs000200 and ClinicalTrials: NCT00000611.
This prospective study investigated whether circulating RNA (C-RNA; cell free RNA present in the bloodstream) measurements could non-invasively monitor pregnancy status, with emphasis on detecting molecular markers associated with preterm preeclampsia. Whole blood samples were collected from pregnant women at the time of diagnosis with early-onset preeclampsia with severe features. Blood samples were also obtained normotypical control pregnancies and from patients diagnosed with late-onset preeclampsia for comparison. For all samples, C-RNA was extracted from 4 mL blood and whole transcriptome enrichment performed on cDNA libraries prior to sequencing to a target depth of 50M reads on an Illumina HiSeq 2000 platform. Throughout the course of this work, whole transcriptome RNA sequencing data was obtained from 289 C-RNA samples collected from 178 subjects. The complete dataset consists of samples from the following cohorts: Illumina Preeclampsia Cohort (iPEC): 40 pregnancies at the time of early-onset preeclampsia diagnosis and 73 gestational age-matched controls PEARL preeclampsia cohort: 12 early-onset and 12 late-onset preeclampsia patients PEARL healthy control cohort: 152 longitundinal samples collected from 41 healthy pregnancies
Amyotrophic lateral sclerosis (ALS), also known as Lou Gehrig's disease, is a fatal and devastating neurodegenerative disorder that causes the progressive death of upper and lower motor neurons. Although many efforts have been done to elucidate molecular factors involved in the onset and progression of the disorder, the causes of ALS are yet unknown and undefined. Transcriptome studies, based mostly on microarrays, have revealed multiple perturbations of the motor neuron function, supporting the current idea that several cellular events contribute to the pathobiology of the disease, including mitochondrial dysfunction, enhanced apoptosis, glutamate-mediated excitotoxicity, free radical injury, protein misfolding, abnormal calcium metabolism and altered axonal transport. In the present study, we have deeply sequenced the whole transcriptome of ventral horns of the human lumbar spinal cord from matched control and ALS post-mortem donors. Whole exome sequencing from the same donors has also been performed to exclude known genetic variants associated to the familiar form of ALS. In addition, to characterize the ALS transcriptome we have sequenced the RNA fraction at low molecular weight in the same tissues and individuals. Genomic and transcriptomic reads have been generated using the Illumina HiSeq2000 sequencer.