CADD/GADD centers on Antisocial Drug Dependence
CADD (Center for Antisocial Drug Dependence): Funded through NIDA 011015 to study genetic influences on, and treatment of, antisocial drug dependence, studying both clinical probands and their families, and community samples of matched controls, twins, and participants in an ongoing longitudinal adoption study. A collaboration between three organizations at two campuses of the University of Colorado. Longitudinal with three waves of data collection completed. GADD (Genetics of Adolescent Antisocial Drug Dependence): Funded originally through NIDA 012845, s multisite collaboration including adolescent subjects at high-risk for antisocial drug dependence and their siblings, recruited in Denver, CO and San Diego, CA. Longitudinal with two waves of data collection completed, one in progress as of May, 2018.
Study
phs001841
Public Access Defibrillation Community Trial (PAD)(PAD-BioLINCC)
Data Access NOTE: Please refer to the “Authorized Access” section below for information about how access to the data from this accession differs from many other dbGaP accessions. Objectives The Public Access Defibrillation (PAD) Community Trial sought to evaluate broad implementation of Public Access Defibrillators (PAD) in urban community units. Survival to hospital discharge of participants with out-of-hospital cardiac arrest was the main outcome measure. Survival was compared in community units (e.g., apartment or office buildings, gated communities, sports venues, senior centers, shopping malls) served by non-medical responders trained in CPR and use of automated external defibrillators (AEDs), to units receiving the traditional optimum community standard of care (i.e., rescuers trained to recognize a cardiac emergency, call 911, and initiate CPR). Background Sudden out-of-hospital cardiac arrest (OOH-CA) remains a significant cause of death, in spite of recent declines in overall mortality from cardiovascular disease. Existing methods of emergency resuscitation are inadequate due to time delays inherent in transporting trained responders with defibrillation capabilities to the side of the OOH-CA victim. Existing Emergency Medical Services (EMS) systems typically combine paramedic Emergency Medical Technician (EMT) services with some level of community involvement, such as bystander cardiopulmonary resuscitation (CPR) training. Some communities include automated external defibrillators (AEDs) at isolated sites or in mobile police or fire vehicles. Such an approach typically varies in effectiveness, with an incremental improvement in effectiveness seen in communities that organize and integrate services with the existing EMS system. However, optimal improvement in survival from sudden OOH-CA may require a program that utilizes volunteer non-medical responders (who may not have a traditional duty to respond to an emergency) trained to use AEDs. Participants The PAD trial was a prospective, randomized community based trial. More than 19,000 volunteer responders from 993 community units in 24 North American regions participated. The two study arms had similar unit and volunteer characteristics. Participants with treated out-of-hospital cardiac arrest in the two groups were similar in age (mean: 69.8 years), proportion of men (67 perecnt), rate of cardiac arrest in a public location (70 percent), and rate of witnessed cardiac arrest (72 percent). Conclusions Community units with volunteers trained in CPR and AEDs had significantly more participants surviving to hospital discharge than units with volunteers trained to use CPR only. There were 30 survivors among 128 definite cardiac arrests in the CPR+AED units and 15 survivors among 107 definite cardiac arrests in the CPR only units (p = 0.03). Serious adverse effects were rarely reported. No volunteers received inadvertent shocks, and no participants were shocked unnecessarily. AED maintenance problems were infrequent. A few participating volunteers reported severe stress related to responding to emergency situations. Although residential complexes represented 16% of the units and 29% of the treatable cardiac arrests, only 5% of the survivors were from residential complexes. Such information should be helpful for individual facilities that are considering implementing PAD programs. (NEJM 2004; 351:637-46).
Study
phs003858
The Thrifty Microbiome: The Role of the Gut Microbiota in Obesity in the Amish
Emerging evidence that the gut microbiota may contribute in important ways to human health and disease has led us and others to hypothesize that both symbiotic and pathological relationships between gut microbes and their host may be key contributors to obesity and the metabolic complications of obesity. Our "Thrifty Microbiome Hypothesis" poses that gut microbiota play a key role in human energy homeostasis. Specifically, constituents of the gut microbial community may introduce a survival advantage to its host in times of nutrient scarcity, promoting positive energy balance by increasing efficiency of nutrient absorption and improving metabolic efficiency and energy storage. However, in the presence of excess nutrients, fat accretion and obesity may result, and in genetically predisposed individuals, increased fat mass may result in preferential abdominal obesity, ectopic fat deposition (liver, muscle), and metabolic complications of obesity (insulin resistance, hypertension, hyperlipidemia). Furthermore, in the presence of excess nutrients, a pathological transition of the gut microbial community may occur, causing leakage of bacterial products into the intestinal lymphatics and portal circulation, thereby inducing an inflammatory state, further aggravating metabolic syndrome traits and accelerating atherosclerosis. This pathological transition and the extent to which antimicrobial leakage occurs and causes inflammatory and other maladaptive sequelae of obesity may also be influenced by host factors, including genetics. In the proposed study, we will directly test the Thrifty Microbiome Hypothesis by performing detailed genomic and functional assessment of gut microbial communities in intensively phenotyped and genotyped human subjects before and after intentional manipulation of the gut microbiome. To address these hypotheses, five specific aims are proposed: (1) enroll three age- and sex-matched groups from the Old Order Amish: (i) 50 obese subjects (BMI > 30 kg/m2) with metabolic syndrome, (ii) 50 obese subjects (BMI > 30 kg/m2) without metabolic syndrome, and (iii) 50 non-obese subjects (BMI < 25 kg/m2) without metabolic syndrome and characterize the architecture of the gut microbiota from the subjects enrolled in this study by high-throughput sequencing of 16S rRNA genes; (2) characterize the gene content (metagenome) to assess the metabolic potential of the gut microbiota in 75 subjects to determine whether particular genes or pathways are correlated with disease phenotype; (3) characterize the transcriptome in 75 subjects to determine whether differences in gene expression in the gut microbiota are correlated with disease phenotype, (4) determine the effect of manipulation of the gut microbiota with antibiotics on energy homeostasis, inflammation markers, and metabolic syndrome traits in 50 obese subjects with metabolic syndrome and (5) study the relationship between gut microbiota and metabolic and cardiovascular disease traits, weight change, and host genomics in 1,000 Amish already characterized for these traits and in whom 500K Affymetrix SNP chips have already been completed. These studies will provide our deepest understanding to date of the role of gut microbes in terms of 'who's there?', 'what are they doing?', and 'how are they influencing host energy homeostasis, obesity and its metabolic complications? PUBLIC HEALTH RELEVANCE: This study aims to unravel the contribution of the bacteria that normally inhabit the human gastrointestinal tract to the development of obesity, and its more severe metabolic consequences including cardiovascular disease, insulin resistance and Type II diabetes. We will take a multidisciplinary approach to study changes in the structure and function of gut microbial communities in three sets of Old Order Amish patients from Lancaster, Pennsylvania: obese patients, obese patients with metabolic syndrome and non-obese individuals. The Old Order Amish are a genetically closed homogeneous Caucasian population of Central European ancestry ideal for genetic studies. These works have the potential to provide new mechanistic insights into the role of gut microflora in obesity and metabolic syndrome, a disease that is responsible for significant morbidity in the adult population, and may ultimately lead to novel approaches for prevention and treatment of this disorder.
Study
phs000258
Whole-Genome sequencing of hepatocellular carcinomas
The French ICGC project on liver tumors is coordinated by Pr Jessica Zucman-Rossi and funded by Inca (French Institute for Cancer). The aim of the present project is to identify the catalog of somatic and germline mutations in liver tumors using whole genome and whole exome sequencing together with CGH-SNP, methylome and transcriptomic profiling. For this purpose, a series of 500 liver tumors are collected through the French National Liver Collection and these samples will be analyzed using the different omics technologies. The data will be deposited in the ICGC and EGA database to be publically available for the scientific community. Hepatocellular carcinoma (HCC) accounts for more than 90% of liver cancers, and is a major health problem. It is the 3rd cause of cancer-related mortality. Advances in genomic analyses have formed a comprehensive understanding of different underlying pathobiological layers resulting in hepatocarcinogenesis. Thus, the development of next-generation sequencing technologies has made it possible to generate more comprehensive catalogues of somatic alteration events (single nucleotide substitutions, structural variations, and epigenetic changes) in liver cancer genome than ever before. The dataset will include 50 whole genome sequencing tumor/germline pairs, of which 6 are deposited in February 2014.
Study
EGAS00001000706
Characterizing microbiome-directed fibre snacks in gnotobiotic mice and humans
Knowledge of the interrelationships between what we eat and the configurations of our gut microbial communities is providing important insights into how food components that are not directly metabolized by human enzymes are linked to our physiology and health status. Changing food preferences brought about by Westernization that have deleterious health effects1,2, plus rapid population expansion, ongoing challenges to sustainable agriculture, and other forces contributing to increased food insecurity, are catalyzing efforts to identify more nutritious and affordable foods3. The gut microbial community is complex, dynamic, and exhibits considerable intra- and interpersonal variation in its composition and functions. The massive number of potential interactions between its components makes it challenging to define the mechanisms by which food ingredients affect community properties. There is also a paucity of information about the ‘bioactive’ ingredients of foods that influence the fitness and expressed functions of community members. Here, plant fibres, from different sustainable sources and targeting distinct features of obese human gut microbiomes in gnotobiotic mice, were formulated into snack prototypes and used to supplement controlled diets consumed by overweight and obese adults; the results revealed fibre-specific changes in their microbiomes that were linked to changes in their plasma proteomes indicative of altered physiologic state.
Study
EGAS00001005268
DNA methylation repeatability in the Lothian Birth Cohorts of 1921 and 1936.
The repeatability of longitudinal measures of whole blood DNA methylation (obtained using the Illumina 450k chip) was assessed in two cohorts of ageing. The Lothian Birth Cohort of 1921 and the Lothian Birth Cohort of 1936. Data were collected at ages 70, 73, and 76 (LBC1936) and 79, 87, 90 (LBC1921) with 478 participants having two or more measures of methylation.
Study
EGAS00001000910
Atherosclerosis Risk in Communities Study (ARIC-BioLINCC)
Data Access NOTE:Please refer to the “Authorized Access” section below for information about how access to the data from this accession differs from many other dbGaP accessions. At present, the ARIC sIRB has determined that ARIC data cannot be shared with for-profit entities.Related Studies Other ARIC data available include: Imaging studies (ARIC-Imaging), Genetics and genomics (ARIC, phs000280.v8.p2), Collaborative Cohort of Cohorts for COVID-19 Research (C4R): ARIC (phs002988.v1.p1), and as a component of the Sleep Heart Health Study (SHHS-BioLINCC, phs003637.v1.p1). Available Data Data available for request include ARIC v1-v8 examination cycles, collated annual follow-up communication data for contact years 2-32, and follow-up for mortality, heart disease, and stroke events through 2019. Also included are data from ancillary studies. Objectives The objectives of ARIC are to: 1) investigate associations of factors, including those not previously measured in cohort studies, with prevalence of atherosclerosis and incidence of CHD, clinical stroke and other cardiovascular diseases; and 2) measure cardiovascular disease occurrence and trends and relate these to community levels of, and changes in, risk factors, medical care and atherosclerosis. Background At the time of project initiation, the NHLBI had long recognized the need for longitudinal studies to identify the biochemical and physiological markers and specific environmental factors which place individuals at high risk for the major atherosclerosis diseases. The development of reliable ultrasound examination of peripheral arteries enhanced the expected benefit of such studies. Community surveillance planning began for ARIC in response to recommendations of the 1978 NHLBI Workshop on the Decline in CHD Mortality and has been extended in its purpose to evaluate the large geographic differences in U.S. mortality. Participants Black and white men and women, age 45-64 at entry; sample size: 15,792. Design ARIC is a large-scale, long-term prospective study that measures associations of established and suspected coronary heart disease risk factors with both atherosclerosis and new CHD events in men and women from four geographically diverse communities. The project has two components: community surveillance of morbidity and mortality; and repeated examinations of a representative cohort of men and women in each community. The community surveillance involves abstracting hospital records and death certificates and investigating out-of-hospital deaths. The representative cohorts include approximately 4,000 persons from each community. Community surveillance data includes detailed hospital record abstraction, ECG tracings, and event adjudication. Data from out-of-hospital events in the community include physician, informant, and coroner questionnaires as well as death certificate data and event adjudication. Community surveillance ended in 2014.Cohort participants were examined four times at three year intervals between 1987 and 1998, and have been continuously contacted annually to update their medical histories. Atherosclerosis was measured by carotid ultrasonography. Risk factors studied include: blood lipids, lipoprotein cholesterols, and apolipoproteins; plasma hemostatic factors; blood chemistries and hematology; sitting, supine and standing blood pressures; anthropometry; fasting blood glucose and insulin levels; ECG findings; cigarette and alcohol use; physical activity levels; dietary aspects; and family history. Clinic visits were restarted in 2011 with ARIC visit 5.
Study
phs003738
NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry (VU_AF)
The Vanderbilt Atrial Fibrillation (AF) Registry was founded in 2001. Patients with AF and family members are prospectively enrolled. At enrollment a detailed past medical history is obtained along with an AF symptom severity assessment. Blood samples are obtained for DNA extraction. Patients are followed longitudinally along with serial collection of AF symptom severity assessments.
Study
phs001032
Atherosclerosis Risk in Communities (ARIC) Cohort
The Atherosclerosis Risk in Communities (ARIC) Study, sponsored by the National Heart, Lung and Blood Institute (NHLBI), is a prospective epidemiologic study conducted in four U.S. communities. The four communities are Forsyth County, NC; Jackson, MS; the northwest suburbs of Minneapolis, MN; and Washington County, MD. ARIC is designed to investigate the etiology and natural history of atherosclerosis, the etiology of clinical atherosclerotic diseases, and variation in cardiovascular risk factors, medical care and disease by race, gender, location, and date. ARIC includes two parts: the Cohort Component and the Community Surveillance Component. The Cohort Component began in 1987, and each ARIC field center randomly selected and recruited a cohort sample of approximately 4,000 individuals aged 45-64 from a defined population in their community. A total of 15,792 participants received an extensive examination, including medical, social, and demographic data. These participants were examined with the baseline visit occurring in 1987-89, the second visit in 1990-92, the third visit in 1993-95, the fourth visit in 1996-98, the fifth visit in 2011-13, the sixth visit 2016-17 and the seventh visit 2018-19. Follow-up occurs yearly by telephone to maintain contact with participants and to assess health status of the cohort. In the Community Surveillance Component, these four communities were investigated to determine the community-wide occurrence of hospitalized myocardial infarction and coronary heart disease deaths in men and women aged 35-84 years. Hospitalized stroke is investigated in cohort participants only. Starting in 2006, the study conducted community surveillance of inpatient (ages 55 years and older) and outpatient heart failure (ages 65 years and older) for heart failure events beginning in 2005. Community Surveillance for non-cohorts ended in event year 2014. ARIC is currently funded through 2028. The ARIC Cohort is utilized in the following dbGaP sub-studies. To view genotypes, other molecular data, and derived variables collected in these sub-studies, please click on the following sub-studies below or in the "Sub-studies" section of this top-level study page phs000280 ARIC Cohort. phs000557 ARIC_CARe phs000090 GENEVA_ARIC phs000223 PAGE_CALiCo_ARIC phs000398 GO-ESP: HeartGo_ARIC phs000668 CHARGE_ARIC phs000860 MICORTEX phs001536 CCDG_ARIC
Study
phs000280
This dataset contains fastq and BAM data from female adipose tissue.
Here we have from 64 samples, their corresponding fastq and bam files.
The study group consisted of 17 obese women with normal glucose tolerance and 15 obese women with T2DM classified according to WHO standards. The groups were matched for age, BMI and waist circumference. All the women had been morbidly obese (BMI>40 kg/m2) for at least five years.
Dataset
EGAD00001002202
The epigenetic landscape controlled by p63 in epidermal development
Transcription factor p63 is a key regulator of epidermal keratinocyte proliferation and differentiation. Heterozygous mutations of TP63 encoding p63 cause a spectrum of developmental disorders. EEC syndrome is caused by point mutations in the p63 DNA-binding domain, and manifests ectodermal dysplasia with defects in the epidermis and epidermal related appendages, limb malformation and cleft lip/palate. Five hotspot mutations affecting amino acids, R204, R227, R279, R280 and R304, have been found in approximately 90% of the EEC population. Although the role of p63 in normal epidermal development and differentiation has been demonstrated, the molecular mechanism by which p63 mutations cause the epidermal phenotype in diseases is not yet understood. In the two related studies, we characterize p63 mutant keratinocytes (R204W, R279H and R304W) and p63 mutant iPSCs (R204W and R304W) and the molecular mechanisms underlying their differentiation defects.
Study
phs001737
Variation and transmission of the human gut microbiota across generations - shotgun data
Although composition and functional potential of the human gut microbiota evolve over lifespan, kinship has been identified as a key covariate of microbial community diversification. To date, sharing of microbiota features within families has however mostly been assessed between parents and their direct offspring. Here, we investigate potential transmission and persistence of familial microbiome patterns and microbial genotypes in a family cohort (N=102) spanning three to five generations over the same female bloodline. We observe microbiome community composition to be associated with kinship, with seven (low-abundant) genera displaying familial distribution patterns. While kinship and current cohabitation emerged as closely entangled variables, our explorative analyses of microbial genotype distribution and transmission estimates point at the latter as a key covariate of strain dissemination. Highest potential transmission rates are estimated between sisters and mother-daughter pairs, decreasing with increasing daughter’s age, and being higher among cohabiting pairs than those living apart. Although rare, we do detect potential transmission events spanning three and four generations, primarily involving species of the genera Alistipes and Bacteroides. Overall, while our analyses confirm the existence of family-bound microbiome community profiles, transmission or co-acquisition of bacterial strains appears to be strongly linked to cohabitation.
Study
EGAS00001005649
Variation and transmission of the human gut microbiota across generations - 16S data
Although composition and functional potential of the human gut microbiota evolve over lifespan, kinship has been identified as a key covariate of microbial community diversification. To date, sharing of microbiota features within families has however mostly been assessed between parents and their direct offspring. Here, we investigate potential transmission and persistence of familial microbiome patterns and microbial genotypes in a family cohort (N=102) spanning three to five generations over the same female bloodline. We observe microbiome community composition to be associated with kinship, with seven (low-abundant) genera displaying familial distribution patterns. While kinship and current cohabitation emerged as closely entangled variables, our explorative analyses of microbial genotype distribution and transmission estimates point at the latter as a key covariate of strain dissemination. Highest potential transmission rates are estimated between sisters and mother-daughter pairs, decreasing with increasing daughter’s age, and being higher among cohabiting pairs than those living apart. Although rare, we do detect potential transmission events spanning three and four generations, primarily involving species of the genera Alistipes and Bacteroides. Overall, while our analyses confirm the existence of family-bound microbiome community profiles, transmission or co-acquisition of bacterial strains appears to be strongly linked to cohabitation.
Study
EGAS00001005651
MOSAIC - Multi-Omics Spatial Atlas In Cancer
MOSAIC is a collaborative initiative founded by Owkin, Lausanne University Hospital (CHUV), Charité Universitätsmedizin Berlin, University Hospital Erlangen (UKER), Gustave Roussy Institute in Paris, and University of Pittsburgh. The goal of MOSAIC is to build the largest collection of spatial omics data in cancer. By integrating comprehensive high quality clinical annotations with advanced deep profiling techniques, MOSAIC aims to uncover novel cancer subtypes and identify key drug targets and biomarkers within them.
Study
EGAS50000000689
WTCCC2 Visceral Leishmaniasis (VL) samples
A WTCCC2 project genome-wide association study for visceral leishmaniasis (VL) in individuals from India, Brazil and Sudan, genotyped on the custom Illumina 670k array. The WTCCC2 analysis of the Brazilian and Indian samples is described in Fakiola et al. [Nat Genet. 2013 Feb;45(2):208-13].It should be noted that due to expected family structure in the data, normal analyses of these data should include an estimation of the relatedness between the samples. For more details about sample collection for the project please refer to the Methods section of the paper above. The samples from India were all collected from Bihar state in northeastern India. The Brazilian samples were collected from a wide area of northern Brazil, and managed at two laboratories in Natal and Belem. For more details on the geographic distribution of the Brazil samples please see Jamieson et al. [Genes Immun. 2007 Jan;8(1):84-90]. We have not generated qc_passed or info files for the Sudanese data, a preliminary scan of the data indicated that there may be DNA quality issues with these samples.
Study
EGAS00001000773
A Combined Omics and Tissue Biobank for Paediatric Cancers
The Swedish Childhood Tumour Biobank (BTB) is a nonprofit national infrastructure for samples and genomic data collection from pediatric patients diagnosed with central nervous system and other solid tumours in Sweden. Fresh frozen tumour tissue and patient’s matched blood are collected from affected patients and genomic data is continuously generated and made available for approved medical research projects. We expect that the accessibility of biological samples and/or data to the research community will contribute to further advances in the understanding of tumour biology, which in turn will impact many aspects of clinical care of children with cancer, by refining diagnosis and/or identifying oncogenic drivers and therapeutic targets.
Study
EGAS50000000209
Heterogeneity in Lysosomal Storage Disorders
Gaucher disease (GD) is an inherited lysosomal storage disorder caused by biallelic pathogenic variants in GBA1, which encodes glucocerebrosidase. Additionally, GBA1 variants are a well-established genetic risk factor for Parkinson's disease (PD), leading some PD research centers to incorporate GBA1 screening to identify patients who may benefit from targeted therapeutics. However, detecting GBA1 variants remains challenging due to the gene's high sequence homology with its pseudogene, which can give rise to complex recombinant alleles. This study evaluates the performance of Gauchian, a recently developed software tool for identifying GBA1 variants from whole genome sequencing (WGS) data. The assessment was conducted in a cohort of 90 individuals with GD and five GBA1 heterozygotes, all previously characterized through Sanger sequencing. While Gauchian successfully identified the most common genotypes, it demonstrated limitations in detecting rare or de novo variants due to its restricted internal database and heavy reliance on intergenic structural variants. These limitations led to misclassified homozygosity, incomplete genotyping, and undetected recombination events, ultimately reducing the tool's utility for comprehensive variant screening and precluding its application in diagnostic settings. The study dataset includes WGS from the 90 individuals used in the Gauchian pipeline. This data will be available through dbGaP to facilitate further research on GBA1 variant detection, genotype-phenotype correlations in GD and PD, and the development of improved bioinformatics tools for complex variant identification.
Study
phs003459
eMERGE Geisinger eGenomic Medicine (GeM) - MyCode Project Controls
A research cohort of adult Geisinger Clinic patients was enrolled from community-based primary care clinics of the Geisinger Health System. Patients were eligible for enrollment if they were a primary care patient of a Geisinger Clinic physician and were scheduled for a non-emergent clinic visit. All participants provided written informed consent and HIPAA authorization. Consenting patient agreed to provide blood samples for broad biomedical research use, and permission to access data in their Geisinger electronic medical record for research. The enrollment rate was 90% of patients approached. The demographics of the cohort approximate those of the Geisinger Clinic outpatient population. Research blood samples were collected during an outpatient clinical phlebotomy encounter. Research blood samples are coded and stored in a central biorepository. Samples are linkable to clinical data in a de-identified manner for research via an IRB-approved data broker process. For genomic analysis, DNA is extracted from EDTA-anticoagulated whole blood. For the initial eMERGE Geisinger eGenomic Medicine (GeM) genotyping project, a subset of 1,232 unique samples were genotyped using Illumina HumanOmniExpress-12 v1.0 arrays, and used as population controls for other Geisinger Clinic case cohorts, including abdominal aortic aneurysm and gastric bypass surgery cases. These samples were selected from a larger subset of approximately 6,000 MyCode DNA samples using a partial matching algorithm that included age, sex, and body mass index as variables.
Study
phs000381
Rapid Early Action for Coronary Treatment (REACT-BioLINCC)
Data Access NOTE: Please refer to the “Authorized Access” section below for information about how access to the data from this accession differs from many other dbGaP accessions. Objectives: This multicenter controlled community study developed and evaluated the impact of a community educational intervention program on participant delay time from onset of symptoms of an acute myocardial infarction (AMI) to arrival at a hospital emergency department.Background: Although early reperfusion or thrombolytic therapy can reduce morbidity and mortality following an AMI, delayed access to medical care in participants is relatively common. Mean delay times from symptom onset to hospital arrival range from more than 4 hours to 24 hours, and the largest component of prolonged delay is participant recognition and action.Participants: A total of 20 communities from 5 field centers in the U.S. were pair-matched (10 pairs) according to geographic proximity and demographic characteristics. After initiation of a 4 month baseline surveillance period, one community in each pair was randomly selected to receive the intervention. The baseline surveillance period was followed by an 18 month community intervention and surveillance period. The community surveillance captured a total of 59,944 adults aged 30 years or older presenting to hospital emergency departments with chest pain, of whom 20,364 met study criteria for suspected acute coronary heart disease (CHD) at admission and discharged with a CHD diagnosis.Conclusions: Delay times were decreased in the intervention and reference communities. The results showed that the multicomponent community intervention program did not differentially reduce delay time from onset of AMI symptoms to arrival at a hospital, but did significantly increase the use of Emergency Medical Services by these participants in the intervention communities. (PMID:10872014).
Study
phs003885
National Cancer Institute (NCI) Study of Lung Cancer and Smoking Phenotypes in African-American Cases and Controls
This is a two-stage case-control study designed to evaluate the association between common genetic variants and the risk of lung cancer. The stage 1 studies included 1737 cases and 3602 controls from the following studies: MD Anderson Lung Cancer Epidemiology Study, The Multiethnic Cohort Study (MEC), NCI-MD Lung Cancer-Case Control Study, Northern California Lung Cancer Study, Project CHURCH (Creating a Higher Understanding of Cancer Research δ Community Health), Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO), Southern Community Cohort Study (SCCS), and the Karmanos Cancer Institute at Wayne State University (KCI/WSU). The stage 2 studies included an independent set of 866 cases and 796 controls from the following studies: The Black Women's Health Study (BWHS), The Harvard-MGH Lung Cancer Susceptibility Study (HLCS), MD Anderson Lung Cancer Epidemiology Study, MD Anderson/LBJ Hospital Biorepository, NCI-MD Lung Cancer Case-Control Study, Northern California Lung Cancer Study, Philadelphia Lung Cancer Study on Gene Environment Interactions (Plus-Gene), Southern Community Cohort Study (SCCS), and KCI/WSU.
Study
phs001210
NHLBI TOPMed - NHGRI CCDG: Hispanic Community Health Study/Study of Latinos (HCHS/SOL)
This study contains whole genome sequence data. A case-control sample of individuals from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), a multicenter prospective cohort study of 16,415 persons of Cuban, Dominican, Mexican, Puerto Rican, Central American, and South American background (phs000810), was selected for whole genome sequencing, including participants with a history of physician-diagnosed asthma and asthma-free participants.
Study
phs001395
The Atherosclerosis Risk in Communities (ARIC) Study
The Atherosclerosis Risk in Communities (ARIC) Study, sponsored by the National Heart, Lung and Blood Institute (NHLBI), is a prospective epidemiologic study conducted in four U.S. communities. The four communities are Forsyth County, NC; Jackson, MS; the northwest suburbs of Minneapolis, MN; and Washington County, MD. ARIC is designed to investigate the etiology and natural history of atherosclerosis, the etiology of clinical atherosclerotic diseases, and variation in cardiovascular risk factors, medical care and disease by race, gender, location, and date. ARIC includes two parts: the Cohort Component and the Community Surveillance Component. The Cohort Component began in 1987, and each ARIC field center randomly selected and recruited a cohort sample of approximately 4,000 individuals aged 45-64 from a defined population in their community. A total of 15,792 participants received an extensive examination, including medical, social, and demographic data. These participants were reexamined every three years with the first screen (baseline) occurring in 1987-89, the second in 1990-92, the third in 1993-95, and the fourth and last exam was in 1996-98. Follow-up occurs yearly by telephone to maintain contact with participants and to assess health status of the cohort. In the Community Surveillance Component, currently ongoing, these four communities are investigated to determine the community-wide occurrence of hospitalized myocardial infarction and coronary heart disease deaths in men and women aged 35-84 years. Hospitalized stroke is investigated in cohort participants only. Starting in 2006, the study conducts community surveillance of inpatient (ages 55 years and older) and outpatient heart failure (ages 65 years and older) for heart failure events beginning in 2005. ARIC is currently funded through January 31, 2012. This study is part of the Gene Environment Association Studies initiative (GENEVA, http://www.genevastudy.org) funded by the trans-NIH Genes, Environment, and Health Initiative (GEI). The overarching goal is to identify novel genetic factors that contribute to atherosclerosis and cardiovascular disease through large-scale genome-wide association studies of well-characterized cohorts of adults in four defined populations. Genotyping was performed at the Broad Institute of MIT and Harvard, a GENEVA genotyping center. Data cleaning and harmonization were done at the GEI-funded GENEVA Coordinating Center at the University of Washington.
Study
phs000090
National Heart, Lung, and Blood Institute (NHLBI) Heart Healthy Lenoir (HHL) Genomics Study
The HHL genomics study uses a systems approach to develop models integrating clinical and genomic data. Previously we developed and tested an approach known as the SAMARA (Supporting A Multidisciplinary Approach to Research in Atherosclerosis) project that applied recent advances in biomedical and computational sciences at The University of North Carolina at Chapel Hill to develop a deeper understanding of human cardiovascular disease (CVD). The Heart-Healthy Lenoir Project expands these studies into the community, using this methodology to: 1) determine the prevalence of genomic risk signatures in high-risk community populations using genome-wide Single Nucleotide Polymorphism (SNP) analysis; 2) develop novel genomic models incorporating high-risk features in this population; and 3) determine whether genomic signatures can be used to predict responsiveness to interventions that underlie CVD disparities. DNA was obtained from participants enrolled in two of the HHL clinical trials, 1) Improving Care for Patients With High Blood Pressure (NCT01425515) or 2) Heart-Healthy Lenoir Lifestyle Study (NCT01433484). Participants could enroll in both trials concurrently.
Study
phs001471
NHLBI TOPMed: Chicago Initiative to Raise Asthma Health Equity (CHIRAH)
The CHIRAH project was a community-based study of the factors associated with asthma morbidity in the African American population. CHIRAH evaluated the role of various variables (biologic/environmental, psychologic/behavioral, and socioeconomic) on asthma morbidity and the function of changes in these variables on asthma morbidity in a longitudinal fashion. This involved collection of a cohort-based on school screening which was sampled to include similar numbers of underprivileged and non-underprivileged subjects which roughly equally represented self-reported African Americans and self-reported non-African Americans. Subjects were followed up every 3 months of this cohort over the course of 2 years.
Study
phs001605
The genomic VCF data of the Integrative proteogenomic characterization of early esophageal cancer project
The genomic VCF data of the Integrative proteogenomic characterization of early esophageal cancer project ,this dataset contains 90 VCF files.
Dataset
EGAD00001008672
Induced pluripotent stem cell lines (iPSC lines) produced from skin and blood cells.
The IMBA stem cell bank, iPSC Biobank, is a collection of induced pluripotent stem cell lines (iPSC lines) produced from skin and blood cells. The iPSC Biobank is integrated into the IMBA Stem Cell Core Facility. It stores and provides high-quality reference control panels of iPSC clones as a tool to the research community. The resource is available to all scientists and commercial institutions. The ethical guidelines of the iPSC Biobank are always based on prevailing laws and internal regulations.
Study
EGAS00001006262
Singapore Adult Metabolism Study - Phase 2 (SAMS2)
Singapore Adult Metabolism Study - Phase 2 (SAMS2) was an interventional study where hundreds of donors aged 21-45 were recruited to participate in a 16-week weight loss program. Study individuals selected (see below for more detailed selection standards) were sedentary (exercise 1 or fewer times a week), obese or overweight with a body fat mass greater than 24% and a BMI between 23-35 kg/m2. For this study, we adjusted BMI definition for Asian population, based on the WHO Consultation 2002, and the BMI cut-off is 23 kg/m2 for overweight and 27.5 kg/m2 for obese. The weight loss program included a combination of dietary interventions, structured exercise sessions, and additional physical activity performed in participants' own time. Energy and protein requirements were calculated based on each participant's weight, height, and physical activity level, with the goal of achieving a 40% calorie deficit. Participants' calorie intake was tracked using food recalls and questionnaires. Additionally, subjects attended structured exercise sessions at least three times per week, supervised by a coach. Each session consisted of 90 minutes of aerobic and strength training exercises, designed to burn approximately 500 kcal per session. To monitor daily physical activity, participants wore pedometers throughout the study. In total, the exercise sessions (500 kcal per session) and daily physical activity (targeting an additional 500 kcal) were aimed at achieving a total caloric expenditure of 2000 kcal per week. We collected clinical data and skeletal muscle biopsies from 54 overweight/obese Asian individuals before and after a 16-week lifestyle intervention, which resulted in an average ~10% weight loss, accompanied by a ~30% increase in insulin-stimulated glucose uptake. Improvements were observed in 118 of 252 clinical traits and six blood lipids. Transcriptomic analysis of paired skeletal muscle biopsies identified 505 differentially expressed genes enriched in mitochondrial function and insulin sensitivity. Thousands of muscle-specific e/sQTLs were detected pre- and post- intervention, including hundreds of lifestyle-responsive e/sQTLs. Notably, approximately 4.2% of eQTLs and 7.3% of sQTLs showed Asian specificity. Joint analysis with GWAS identified 16 putative metabolic risk genes. Our study reveals gene-by-lifestyle interactions and how lifestyle modulates gene regulation in skeletal muscle.
Study
phs004078
Population Architecture using Genomics and Epidemiology (PAGE): Causal Variants Across the Life Course (CALiCo): Atherosclerosis Risk in Communities (ARIC)
CALiCo ARIC The Atherosclerosis Risk in Communities Study (ARIC), sponsored by the National Heart, Lung and Blood Institute (NHLBI), is a prospective epidemiologic study conducted in four U.S. communities. ARIC is designed to investigate the etiology and natural history of atherosclerosis, the etiology of clinical atherosclerotic diseases, and variation in cardiovascular risk factors, medical care and disease by race, gender, location, and date. ARIC includes a Cohort Component and a Community Surveillance Component. Cohort enrollment began in 1987. Each ARIC field center randomly selected and recruited a sample of approximately 4,000 individuals aged 45-64 from a defined population in their community. A total of 15,792 participants received an extensive examination, including medical, social, and demographic data. These participants were reexamined every three years with the first screen (baseline) occurring in 1987-89, the second in 1990-92, the third in 1993-95, and the fourth and last exam wastook place in 1996-98. Follow-up occurs yearly byA fifth cohort examination is underway (2011-2013). Yearly telephone tointerviews maintain contact with participants and to assess health status of the cohort. In the Community Surveillance Component, currently ongoing, these four communities are investigated to determine the community-wide occurrence of hospitalized myocardial infarction and coronary heart disease deaths in men and women aged 35-84 years. Hospitalized stroke is investigated in cohort participants only. The study conducts community surveillance of inpatient heart failure (ages 55 years and older) and cohort surveillance outpatient heart failure events beginning in 2005.
Study
phs000223
Whole-genome sequencing reveals genomic signatures associated with the inflammatory microenvironments in Chinese NSCLC patients
We integrate genomic (whole-genome sequencing, WGS) and transcriptome (polyA-enriched RNA-Seq) sequencing from 90 NSCLC cases and comprehensively identified the distinct genomic features of Chinese NSCLC patients.
Dataset
EGAD00001004071
A Prospective Study of the Oral Microbiome and Pancreatic Cancer
Two prospective cohort studies - the Black Women's Health Study and the Southern Community Cohort Study - were used to investigate the oral microbiome in relation to pancreatic cancer risk. Using a nested case-control study design, 148 cases and 510 matched controls were included (122 Black American pancreatic cancer cases and 409 matched controls). DNA was extracted from oral wash samples, using the PowerSoil Pro kit. Paired-end whole metagenomic shotgun sequencing was performed, using the Illumina HiSeq2000 platform with a read length of 100 bp (insert size 350 bp).
Study
phs002454
National Institute on Aging - Late Onset Alzheimer's Disease Family Study: Genome-Wide Association Study for Susceptibility Loci
Alzheimer disease is the most common neurodegenerative disorder of the elderly affecting an estimated five million Americans. Genetic factors contribute to the risk for disease with heritability estimates ranging from 57% to 79%. More than a decade ago, the ε4 variant of APOE was identified and remains the most consistently replicated genetic variant influencing the risk of late onset Alzheimer disease. A segregation analysis suggests there may be four additional genes influencing the age-at-onset of Alzheimer disease. In 2007 there were 968 association studies in 398 candidate genes reported, but none replicated consistently. There are many reasons for the lack of consistency, but one important reason for the lack of progress is the paucity of a sufficient number of well characterized families and patients available to the entire scientific community. The extensive effort and expense required to ascertain such a population has been addressed by the NIA-LOAD Family Study. Its goal is to identify and recruit families with two or more siblings with the late-onset form of Alzheimer's disease and a cohort of unrelated, non-demented controls similar in age and ethnic background, and to make the samples, the clinical and genotyping data and preliminary analyses available to qualified investigators world-wide. Genotyping by the Center for Inherited Disease Research (CIDR) was performed using the Illumina Infinium II assay protocol with hybridization to Illumina Human 610Quadv1_B Beadchips. This genotyping represents the largest collection of families ever assembled with Alzheimer's disease combining the NIA-LOAD Genetics Initiative Multiplex Family Study, the National Cell Repository for Alzheimer's Disease (NCRAD) with additional controls from the University of Kentucky. These genotyping results will serve as a focal point for future research that will identify all of the remaining genetic variants in Alzheimer's disease.
Study
phs000168
Collaborative Cohort of Cohorts for COVID-19 Research (C4R): Hispanic Community Health Study/Study of Latinos (HCHS/SOL)
Cohort DescriptionHispanic Community Health Study/Study of Latinos (HCHS/SOL) is an ongoing population-based prospective cohort study of 16,415 community dwelling Hispanic/Latino adults aged 18-74 years at baseline, recruited from four urban field centers with large populations of Hispanics/ Latinos (Bronx, NY; Chicago, IL; Miami, FL; and San Diego, CA). The primary goals of the HCHS/SOL are to describe: (1) the prevalence and incidence of cardiovascular, pulmonary, and other major chronic conditions (2) the risk and/or protective factors associated with these conditions; and (3) the relationships between the initial sociodemographic and health profiles and future health events in the target population. Data Being Submitted Wave 1 questionnaire data includes 397 variables for up to 9817 HCHS/SOL participants in C4R. Wave 2 questionnaire data includes 448 variables for up to 7536 HCHS/SOL participants in C4R. Derived data includes 43 variables for up to 11182 HCHS/SOL participants in C4R. Phenotype data includes 113 variables for up to 11182 HCHS/SOL participants in C4R.
Study
phs002908
Liquid biopsy for molecular characterization of diffuse large B-cell lymphoma and early assessment of minimal residual disease
Circulating tumor DNA (ctDNA) allows genotyping and minimal residual disease (MRD) detection in lymphomas. Using a NGS approach (Euroclonality-NDC), we evaluated clinical and prognostic value of ctDNA in a series of R-CHOP-treated DLBCL patients at baseline (n=68) and after 2-cycles (n=59), monitored by metabolic imaging (PET/CT).
A molecular marker was identified in 61/68 (90%) ctDNA samples at diagnosis. Pre-treatment high ctDNA levels significantly correlated with elevated LDH, advanced stage, high risk IPI and a trend to shorter 2-year PFS. Valuable NGS data after 2-cycles of treatment were obtained in 44 cases, and 38 achieved major molecular response (MMR; 2.5-log drop in ctDNA). PFS curves displayed statistically significant differences among those achieving MMR vs. those not achieving MMR (2yr PFS of 76% vs. 0%, p<0.001). Similarly, more than 66% reduction in SUVmax by PET/CT identified two subgroups with different prognosis (2yr PFS of 83% vs. 38%; p<0.001). Combining both approaches MMR and SUVmax reduction, a better stratification was observed (2yr PFS of 84% vs. 17% vs. 0%, p<0.001).
Euroclonality-NDC panel allows the detection of a molecular marker in the ctDNA in 90% of DLBCL. ctDNA reduction at 2 cycles and its combination with interim PET results improves patient prognosis stratification.
Study
EGAS50000000215
Collection: Hispanic Community Health Study/Study of Latinos (HCHS/SOL)
The Hispanic Community Health Study / Study of Latinos (HCHS/SOL) is a multi-center epidemiologic study in Hispanic/Latino populations to determine the role of acculturation in the prevalence and development of disease, and to identify risk factors playing a protective or harmful role in Hispanics/Latinos. The study is sponsored by the National Heart, Lung, and Blood Institute (NHLBI) and six other institutes, centers, and offices of the National Institutes of Health (NIH). The goals of the HCHS/SOL include studying the prevalence and development of disease in Hispanics/Latinos, including the role of acculturation, and identifying disease risk factors that play protective or harmful roles in Hispanics/Latinos. A total of 16,415 persons of Cuban, Dominican, Mexican, Puerto Rican, Central American, and South American backgrounds were recruited through four Field Centers affiliated with San Diego State University, Northwestern University in Chicago, Albert Einstein College of Medicine in the Bronx area of New York, and the University of Miami. Seven additional academic centers serve as scientific and logistical support centers. Study participants aged 18-74 years took part in an extensive clinic exam and assessments to ascertain socio-demographic, cultural, environmental and biomedical characteristics. Annual follow-up interviews are conducted to determine a range of health outcomes.To request access to this collection, select phs003650 in the dbGaP when submitting a data access request.
Study
phs003650
41 fibroblast RNAseq samples of pediatric patients with childhood epilepsy and developmental delay
We used RNA sequencing of patient fibroblasts to find a genetic diagnosis in patients with exome-negative childhood epilepsy. RNA extracted from 41 individuals' cultured fibroblasts (passage 3-6) was collected and sequenced at a depth of 90-110 paired reads per sample.
Study
EGAS50000000792
Cancer Moonshot Biobank
The National Cancer Institute initiated the Cancer Moonshot Biobank to accelerate research through the collection and distribution of biospecimens for key research efforts. The Biobank will engage with over one thousand cancer patients, being treated with standard-of-care therapies, for longitudinal biospecimen and data collection. Cancer patients will be enrolled into the study from a diverse participant population engaged at medical institutions throughout the U.S. The biospecimens will be distributed to qualified scientists to accelerate research progress. The Biobank program, working closely with Leidos Biomedical, Inc. (LBR), will partner with community hospitals and other medical institutions, designated as Biospecimen Source Sites (BSS), to engage and consent eligible participants and collect biospecimens and associated clinical data; work with a central Biospecimen Core Resource (BCR) to support biospecimen collection, processing and storage activities and perform pathology quality control; work with LBR's Molecular Characterization (MoCha) lab to perform clinical tumor molecular characterization assays and return results to patients and their physicians; and make the biospecimens and associated data available to researchers to accelerate cancer research progress. Return of clinical results, a dedicated Patient and Provider Engagement (PPE) website, electronic informed consent, local participant engagement projects, and an embedded ethical, legal and social issues sub-study will together support patient and provider engagement in the program. Cancer Moonshot Biobank imaging data is being made available in The Cancer Imaging Archive (TCIA) on a release schedule that is coordinated with the program releases of data to dbGaP. The imaging data can be found at the following link: Cancer Moonshot Biobank Imaging
Study
phs002192
National Sleep Research Resource (NSRR): Hispanic Community Health Study/Study of Latinos
The Hispanic Community Health Study / Study of Latinos (HCHS/SOL) is a multi-center epidemiologic study in Hispanic/Latino populations to determine the role of acculturation in the prevalence and development of disease, and to identify risk factors playing a protective or harmful role in Hispanics/Latinos. The study is sponsored by the National Heart, Lung, and Blood Institute (NHLBI) and six other institutes, centers, and offices of the National Institutes of Health (NIH) contributed to the first phase of the project. Raw polysomnography data are available from the HCHS/SOL Baseline visit and raw actigraphy data are available from the Sueño Ancillary visit. Primary HCHS/SOL data can be requested through dbGaP phs000810 Hispanic Community Health Study /Study of Latinos (HCHS/SOL).
Study
phs003543
CRU-Ukrainian National Research Center for Radiation Medicine Trio Study
Data on transgenerational effects following nuclear accidents are important for understanding fully the consequences of parental exposure to ionizing radiation. Few studies to date have had adequate statistical power to detect effects of the magnitude expected based on animal data, and most have not been of low-dose, protracted exposures associated with nuclear accidents and their aftermath. Although, to date, scant use has been made of the new genomic technologies, in Chernobyl-exposed areas of Ukraine and Belarus, excess minisatellite mutations have been seen in children born after the accident. We propose a study of parent-child trios in which at least one parent was exposed to Chernobyl radiation as a clean-up worker (mean dose>=100 mGy) and/or evacuee from a contaminated area (mean >=50 mGy). The specific aims are to investigate the transgenerational and de novo mutation rates of the spectrum of genetic variants in trios, in particular looking at effects in children and mapping them to possible parental origin of the chromsoome. Together with long-term collaborators at the Research Center for Radiation Medicine (RCRM) in Kiev, epidemiologic data will be collected for up to 450 trios of parents with preconceptional doses and their unexposed offspring. We will use state-of-the-art genomic technologies to characterize the landscape of the genomes of the trios to determine whether parental radiation exposure is associated with genetic mutations transmitted to the offspring, by examining de novo mutation rates, minisatellite mutations, copy number alterations, and variations in telomere length. The analysis will be conducted in peripheral blood and/or buccal samples (when blood is not available) from complete father-mother-child trios. Doses to the gonads from the time of the accident to the time of conception will be reconstructed for all parents using existing records supplemented by interview data. Trio subjects will be selected from representative populations exposed to radiation from Chernobyl who are under active follow-up in the Clinico-Epidemiologic Registry at RCRM. To help identify specific effects of paternal and maternal radiation exposure, we will initially select sets of trio subjects in five categories: (1) exposed father, unexposed mother; (2) unexposed father, exposed mother; (3) both parents exposed; (4) both parents unexposed; and (5) a group of high dose "emergency workers" with acute radiation syndrome. All trio members will be invited to the RCRM outpatient clinic for collection of a 20 ml blood sample (or buccal cells for those who refuse phlebotomy). Both parents will be asked to complete a general questionnaire to obtain demographic and lifestyle data. Then one or both will complete detailed dosimetry questionnaires, based on forms used in previous collaborations with RCRM and administered by specially trained interviewers. Once 50 trios have been recruited (10 from each of the 5 exposure categories), we will conduct an interim evaluation of participation rates, sample collection and quality, and dose reconstruction in order to modify the protocol as needed. The analytical approach will be to correlate the extent, especially for de novo events of genetic alterations in the offspring with parental pre-conceptional radiation dose overall and by parental origin. The statistical power in relation to de novo mutations is very high, in excess of 90%, but somewhat lower for trends in minisatellite mutations. Study findings will contribute importantly to knowledge of the heritable effects of moderate- and low-dose radiation exposure in humans and to radiation risk projection. Eventually data from the Trio Study may be shared with the international community through dbGap.
Study
phs001163
DAC-2023-07-05-Ritz (DAC-007) - Diagnosis of tuberculosis infection in children with a novel skin test and the traditional tuberculin skin test: an observational study
Diagnosis of tuberculosis infection in children with a novel skin test and the traditional tuberculin skin test: an observational study
This is only the location for the raw data and any check if the dataset may be relevant please refer to the MF-DAC Zenodo community (https://zenodo.org/records/11237107).
Study
EGAS50000000780
UK Biobank whole cohort directly genotyped and imputed data (~500,000 participants)
Genotype data is available for all 500,000 participants in the UK Biobank cohort. Genotyping has been performed using the Affymetrix UK BiLEVE Axiom array on an initial 50,000 participants and the remaining 450,000 participants have been genotyped using the Affymetrix UK Biobank Axiom® array. The two arrays are extremely similar (with over 95% common content).Quality control and imputation (to over 90 million SNPs, indels and large structural variants) has been performed by a collaborative group headed by the Wellcome Trust Centre for Human Genetics.The following data are available:* A clean set of QC’ed genotype calls* Confidence values that a genotype call is correct* Intensity data to generate cluster plots* Extensive QC information regarding SNPs and samples including SNP metrics, batch effects, population structure and relatedness* Imputed dataFor further information, please refer to the UK Biobank website
Study
EGAS00001002399
Richter Syndrome Methylation dataset
Raw idat files for 90 RS + DLBCL + CLL samples.
Dataset
EGAD00010002194
A Dose Escalation Study of Efmarodocokin Alfa (UTTR1147A) in Healthy Volunteers and Patients with Ulcerative Colitis
Background: The interleukin-22 cytokine (IL-22) has demonstrated efficacy in nonclinical colitis models with a non-immunosuppressive mechanism-of-action. Efmarodocokin alfa (UTTR1147A) is a fusion protein agonist that links IL-22 to the crystallizable fragment (Fc) of human immunoglobulin (IgG4) for improved pharmacokinetic characteristics, but with a mutation to minimize Fc effector functions.
Methods: This randomized, phase 1b study evaluated the safety, tolerability, pharmacokinetics, and pharmacodynamics of repeat intravenous dosing of efmarodocokin alfa in healthy volunteers (HVs; n=32) and patients with ulcerative colitis (UC; n=24) at 30–90 µg/kg doses given biweekly (Q2W) or monthly (Q4W) (6:2 active:placebo per cohort).
Results: The most common adverse events (AEs) were on-target dermatological effects (dry skin, erythema, and pruritus) that were reversible. Dose-limiting non-serious dermatological AEs (severe dry skin, erythema, exfoliation, and discomfort) were seen in two HVs and one patient at 90 ug/kg Q2W. Pharmacokinetics were generally dose-proportional across the dose levels tested, but patients demonstrated lower drug exposures relative to HVs at the same dose. IL-22 serum biomarkers and IL-22 responsive genes in colon biopsies were induced with active treatment. Patients demonstrated changes in microbiota composition following active treatment, thereby reversing baseline dysbiosis. Clinical response was observed in 7/18 active- and 1/6 placebo-treated patients; clinical remission was observed in 5/18 active- and 0/6 placebo-treated patients.
Study
EGAS00001006172
Comprehensive Deep Sequencing Atlas in HCC tumors
The atlas aims to unravel the intricate genetic landscape of HCC, providing a detailed characterization of genomic alterations, transcriptomic profiles, and key mutations associated with liver tumors. The integration of WGS, RNAseq, and WES data offers a holistic perspective, facilitating a deeper understanding of the molecular mechanisms driving HCC pathogenesis. The resulting atlas serves as a valuable resource for researchers, clinicians, and the broader scientific community, contributing to advancements in HCC diagnostics, prognostics, and therapeutic interventions.
Dataset
EGAD00001015343
EOSC4Cancer Synthetic Colorectal Cancer Genomic data
In this study there are stored the synthetic datasets produced by partners involved in EOSC4Cancer. The study contains independent datasets available for the research community. The synthetic genomes have been created trying to mimic real cancer data.
Study
EGAS50000000190
Is the Gut Important in Multiple Joint Osteoarthritis? A Multimodal Investigation in Humans and Pet Dogs
The Gut Health in Multiple Joint Osteoarthritis (MJOA) Study leverages data from parallel community-based cohorts in humans and in pet dogs to elucidate the role of altered microbiota in MJOA. One hundred Johnston County Health Study human participants were 35 to 70 years of age at enrollment (2022-2023), self-identified as Hispanic, White, or Black, and lived in Johnston County, North Carolina. Demographic, clinical information, multiple joint radiographs, and stool samples for microbiome profiling by 16S rRNA gene sequencing were obtained from all participants. Similar data were collected from an independent group of pet dogs (N=115) from the local community, at the North Carolina State University (NCSU) College of Veterinary Medicine. The central hypothesis of the study is that intestinal permeability, with or without dysbiosis, is a major driver in the development and worsening of MJOA.
Study
phs003980
Integrative molecular analysis of pediatric Anaplastic large cell lymphoma reveals subtypes with distinct immune suppression signatures.
Anaplastic large cell lymphoma (ALCL) is a peripheral T-cell lymphoma accounting for 10–15% of all childhood lymphomas. While more than 90% of the ALCL cases contain ALK-rearrangement, these tumors possess significant inter-tumor molecular heterogeneity that contributes to distinct morphologic differences and clinical impact. To gain insight into the molecular heterogeneity within ALK+ ALCL, we performed whole-exome sequencing, RNA-sequencing, and methylome analysis of 42 primary pediatric ALK+ ALCL patients. Our data showed that ALK+ALCLs was subclassified into two subtypes based on ALK gene expression, methylation profiles, and somatic mutation patterns. ALK-low samples had more highly methylated gene regions while enriched with immune infiltration. ALK-high samples were enriched with somatic copy number alteration and high expression of MYC and PD-L1. These data indicate that ALK expression, somatic mutations, and aneuploidy status are negatively associated with immune infiltration, stipulating that ALK expression status might be associated with resistance to immune checkpoint inhibitors.
Study
EGAS00001004189
AACR Project Genomics, Evidence, Neoplasia, Information, Exchange (GENIE)
AACR Project Genomics Evidence Neoplasia Information Exchange (GENIE) is an international pancancer registry of real-world data assembled through data sharing between 19 leading international cancer centers with the goal of improving clinical decision-making. The registry leverages ongoing clinical sequencing efforts (CLIA/ISO-certified) at participating cancer centers by pooling their data to create a novel, open-access registry to serve as an evidence base for the entire cancer community. Genomic and baseline clinical data from more than 70,000 tumors is accessible through the efforts of our strategic and technical partners, Sage Bionetworks and cBioPortal. The consortium and its activities are driven by openness, transparency, and inclusion to ensure that the project output remains accessible to the global cancer research community and ultimately benefits patients.
Study
phs001337
Constructing database of magnetic resonance imaging of the brain and the ancillary clinical information for multisite collaboration studies: a dataset from Schizophrenia individuals
Previously there was no neuroimaging database available for research community that incorporated a large Japanese sample of healthy people and patients with neuropsychiatric disorders. Establishing such a database will enable large-scale, neuroimaging-based investigations on pathophysiology of neuropsychiatric disorders by simply using the large data pool or by combining it with investigators' own data sets.
Study
JGAS000043
Constructing database of magnetic resonance imaging of the brain and the ancillary clinical information for multisite collaboration studies: a dataset from normal individuals
Previously there was no neuroimaging database available for research community that incorporated a large Japanese sample of healthy people and patients with neuropsychiatric disorders. Establishing such a database will enable large-scale, neuroimaging-based investigations on pathophysiology of neuropsychiatric disorders by simply using the large data pool or by combining it with investigators' own data sets.
Study
JGAS000042
Constructing database of magnetic resonance imaging of the brain and the ancillary clinical information for multisite collaboration studies: a dataset from Bipolar disorder individuals
Previously there was no neuroimaging database available for research community that incorporated a large Japanese sample of healthy people and patients with neuropsychiatric disorders. Establishing such a database will enable large-scale, neuroimaging-based investigations on pathophysiology of neuropsychiatric disorders by simply using the large data pool or by combining it with investigators' own data sets.
Study
JGAS000045
Constructing database of magnetic resonance imaging of the brain and the ancillary clinical information for multisite collaboration studies: a dataset from Depression individuals
Previously there was no neuroimaging database available for research community that incorporated a large Japanese sample of healthy people and patients with neuropsychiatric disorders. Establishing such a database will enable large-scale, neuroimaging-based investigations on pathophysiology of neuropsychiatric disorders by simply using the large data pool or by combining it with investigators' own data sets.
Study
JGAS000044
Comprehensive single-cell genome analysis at nucleotide resolution using the PTA Analysis Toolbox
We developed a comprehensive bioinformatic workflow, called the PTA Analysis Toolbox (PTATO), to accurately detect single base substitutions, insertions-deletions (indels) and structural variants in PTA-based WGS data. PTATO includes a machine learning approach and filtering based on recurrency to distinguish PTA-artefacts from true mutations with high sensitivity (up to 90%), outperforming existing bioinformatic approaches. Our results show that PTATO enables studying somatic mutagenesis in the genomes of single cells with unprecedented sensitivity and accuracy.
Study
EGAS00001007288
Understanding Determinants of Racial Disparities in Lung Cancer Incidence
This is a nested case-control study designed to evaluate the association between common genetic variants and the risk of lung cancer. The study included 1258 cases and 1252 controls selected from the Southern Community Cohort Study (SCCS), a large, prospective observational cohort. Cases were identified from cancer registries across the catchment area of the SCCS. Cases and controls were frequency matched on age, sex, race, and enrollment site (community health center vs general population).
Study
phs003789
SCANDARE HNSCC 3' Tag RNA-seq
SCANDARE (NCT03017573) is a multicentric biobanking study, enrolling adult patients with newly diagnosed head and neck squamous cell carcinoma (HNSCC), triple negative breast cancer (TNBC), ovarian and cervical cancer. Tumor tissue and blood samples are collected at several time points during patient's journey, including at diagnosis, post-neoadjuvant chemotherapy in case of neoadjuvant treatment, at surgery, at recurrence and at disease progression following treatment initiated at recurrence. Since its launch in 2017 at Institut Curie, SCANDARE has enabled the longitudinal collection and preservation of samples for in-depth analyses. Here, we describe molecular alterations in 264 FFPE samples at baseline from 90 HNSCC patients using 3'-tag RNA-seq technology, with FASTQ files deposited in the dataset.
Dataset
EGAD50000001655
RNA-seq of cells cultured in vitro
This dataset includes 90 samples profiled by high-throughput Illumina sequencing, in bam format, aligned to GRCh37. Normal human CD34+ cord blood (CB), bone marrow (BM), or post-natal thymus (PNT) cells were transduced with various combinations of T-ALL oncogenes and cultured in vitro.
Dataset
EGAD00001009750
Characterization of Arabian Peninsula whole exomes
We conducted whole exome sequencing (using the SureSelect Human All Exon V5 + UTRs target enrichment kit) of 90 individuals from AP (23 from Saudi Arabia, 24 from Yemen, 24 from Oman and 19 from UAE).
Study
EGAS00001006487
Characterization of Arabian Peninsula whole exomes
We conducted whole exome sequencing (using the SureSelect Human All Exon V5 + UTRs target enrichment kit) of 90 individuals from AP (23 from Saudi Arabia, 24 from Yemen, 24 from Oman and 19 from UAE).
Dataset
EGAD00001009162
Discovery of new fusion transcripts in a cohort of pediatric solid cancers at relapse
This study aims to dissect further the complexity of fusion landscape in recurrent pediatric cancers. A retrospective study on RNA-sequencing data of 48 pediatric patients at relapse with different malignancies was conducted in order to detect and validate several fusions and predict their probable oncogenic potential and targetability. We obtained a high validation rate of 90% of selected candidates which will serve both clinical and research needs to detect and prioritize experimental validation studies leading to the development of new therapeutic options.
Study
EGAS00001003236
Gut microbiome profiles according to sex, body mass index and dietary fiber intake
Findings from recent studies suggest that the community of microbes residing in the human body is important in disease etiology; however, it remains unclear whether personal factors modulate human microbial composition. Studies based on animal models indicate that differences in composition might be attributed to sex-mediated effects. We analyzed the relationship of sex, adiposity, and dietary fiber intake with gut microbial composition using fecal samples from human subjects. We explored the associations of these factors with metrics of community composition and specific taxon abundances. We found that men and women had significantly different microbial community composition and that women had reduced abundance of a major phylum. Adiposity was associated with gut microbiome composition and specifically in women but not in men. Fiber from fruits and vegetables and fiber from beans were each associated with increased abundance of specific bacterial taxa. These findings provide initial indications that sex, adiposity, and dietary fiber might play important roles in influencing the human gut microbiome. Better understanding of these factors may have significant implications for gastrointestinal health and disease prevention.
Study
phs000884
WGS dataset for gamma delta (γδ) T-ALL patients
Whole genome sequencing data for 61 samples with acute lymphoblastic leukemia expressing the gamma delta T cell receptor (γδ T-ALL) and 29 germline samples
Dataset
EGAD50000000028
Safety and Efficacy of Intravenous Norepinephrine for Orthostatic Hypotension
Patients with chronic autonomic failure (CAF) often have disabling orthostatic hypotension (OH). OH in CAF patients is often associated with supine hypertension (hypertension while lying down), which can be severe. Drugs to treat OH worsen supine hypertension. Therefore, the combination of OH with supine hypertension poses a difficult therapeutic challenge. This protocol is a first step toward development of a way to maintain blood pressure during standing without worsening hypertension while lying down. The research question is will a drug (norepinephrine) given intravenously (IV) prevent blood pressure from falling in patients with orthostatic hypotension. This is a placebo controlled, blinded study of 15 patients with neurogenic orthostatic hypotension. The study consists of two experimental days per participant. On a day before the day of norepinephrine (NE) infusion, the patient undergoes head-up tilting while blood pressure is monitored. Tilt angles are increased until the patient has orthostatic symptoms, systolic pressure decreases to less than 90 mm Hg, or systolic pressure decreases by more than 80 mm Hg. On the day of NE infusion patients, receive NE and placebo with the sequence of treatments randomized. If the patient has severe supine hypertension (more than 200 mm Hg systolic), then NE is infused beginning with the patient at whatever tilt angle is required for baseline pressure to be less than 200 mm Hg. NE is infused at doses titrated to keep directly recorded systolic blood pressure at or above the baseline value during exposure to higher tilt angles. When placebo is given, angles of tilt are increased until the patient has orthostatic symptoms, systolic pressure decreases to less than 90 mm Hg, or systolic pressure decreases by more than 80 mm Hg.
Study
phs001769
SNP genotyping data in genes related to trace element homeostasis
In this study we provide data regarding sample charactheristics and genotypes in 150 human samples of European origin for SNPs located around genes related to metal homeostasis. SNPs were genotyped with the Illumina VeraCode technology (192-plex format). The genotyped SNPs were used to identify quantitative trait loci in the same samples. Data referring to the quantification of 18 micronutrients, the expression levels of 90 genes and the protein abundances of 40 proteins related to metal homeostasis in the same samples are directly available as supplementary material in the publication of this study.
Study
EGAS00001001292
Peruvian Genome Project - Whole Genome Sequencing
In this study, we sequence 150 genomes to high coverage from Native American and mestizo populations in Peru. The majority of our samples possess greater than 90% Native American ancestry, which makes this the most extensive Native American sequencing project to date.
Study
EGAS00001004995
IMCISION DNAseq
Standard-of-care (SOC) surgery for locoregionally advanced head and neck squamous cell carcinoma (HNSCC) is intensive and results in 30‒50% five-year overall survival. Anti-PD1 immune checkpoint blockade (ICB) durably improves survival rates in recurrent/metastatic HNSCC. We report on the non-randomized phase Ib/IIa IMCISION trial (NCT03003637) of 32 HNSCC patients treated with two doses neoadjuvant nivolumab (NIVO MONO, n=6, phase Ib arm A) or two doses of nivolumab plus a single dose of ipilimumab (COMBO, n=26, 6 in phase Ib arm B and 20 in phase IIa) prior to surgery and, if indicated, adjuvant radiotherapy. Neoadjuvant ICB was feasible in all phase Ib patients, meeting the phase Ib primary feasibility endpoint. One phase IIa patient had progressive disease precluding surgery. Primary tumor pathological response (phase IIa primary endpoint), defined as the % change in viable tumor cells from baseline biopsy to on-treatment resection, was evaluable in 29 patients. . We observed a major pathological response (MPR, 90‒100% response) in 8/23 (35%) evaluable COMBO and 1/6 (17%) NIVO MONO patients. None of the patients with MPR after NIVO MONO or COMBO developed a tumor relapse after 24.0 months median follow-up. FDG-PET-based total lesion glycolysis identified MPR patients prior to surgery. A baseline AID/APOBEC-associated mutational profile and an on-treatment decrease in hypoxia signature expression were observed in MPR patients. Our data indicate that neoadjuvant NIVO MONO and COMBO are safe in HNSCC, and that particularly COMBO ICB shows encouraging effectivity. IMCISION provides rationales for future HNSCC trials aiming to pre-operatively select patients with a likely MPR after neoadjuvant ICB for de-escalation of SOC.
Study
EGAS00001005466
NINDS-Genome-Wide Genotyping in Parkinson's Disease: First Stage Analysis and Public Release of Data
Epidemiological studies have estimated a cumulative prevalence of PD of greater than 1 per thousand. When prevalence is
limited to senior populations, this proportion increases nearly 10-fold. The estimated genetic risk ratio for PD is
approximately 1.7 (70% increased risk for PD if a sibling has PD) for all ages, and increases over 7-fold for those under
age 66 years. The role for genes contributing to the risk of PD is therefore significant.
This study utilized the well characterized collection of North American Caucasians
with Parkinson's disease, and neurologically normal controls from the sample
population which are banked in the National Institute of Neurological Disorders and Stroke (NINDS Repository) collection
for a first stage whole genome analysis. Genome-wide, single nucleotide polymorphism (SNP) genotyping of these publicly
available samples was originally done in 267 Parkinson's disease patients and 270 controls, and this has been extended
to include genome wide genotyping in 939 Parkinson's disease cases and 802 controls.
The NINDS repository was established in 10-2001 towards the goal of developing standardized, broadly useful diagnostic
and other clinical data and a collection of DNA and cell line samples to further advances in gene discovery of neurological
disorders. All samples, phenotypic, and genotypic data are available to the research community including to academics and
industry scientists. In addition, well characterized neurologically normal control subjects are a part of the collection.
This collection formed the basis of this first stage study by Fung et al., and the expanded study by Simon-Sanchez et al.
The genotyping data was generated and provided by the laboratory of Dr. Andrew Singleton NIA, and Dr. John Hardy
NIA (NIH Intramural, funding from NIA and NINDS).
Important links to apply for individual-level data
Data Use Certification Requirements (DUC)
Apply here for controlled access to individual level data
Participant Protection Policy FAQ
Study
phs000089
RISE-UP study: riboflavin supplementation in Crohn's disease
Taxonomic classification of the fecal microbiota of Crohn's disease patients showing the shift in the gut bacterial community after riboflavin supplementation.
Study
EGAS50000000982
High Resolution Maps of the HeLa 3D Genome Using Hi-C
We use in situ Hi-C to probe the three-dimensional architecture of genomes, constructing haploid and diploid maps of nine cell types (including HeLa). The densest, in human lymphoblastoid cells, contains 4.9 billion contacts, achieving 1-kilobase resolution. We find that genomes are partitioned into contact domains (median length, 185kb), which are associated with distinct patterns of histone marks and segregate into six subcompartments. We identify ~10,000 loops. These loops frequently link promoters and enhancers, correlate with gene activation, and show conservation across cell types and species. Loop anchors typically occur at domain boundaries and bind CTCF. CTCF sites at loop anchors occur predominantly (>90%) in a convergent orientation, with the asymmetric motifs 'facing' one another. The inactive X-chromosome splits into two massive domains and contains large loops anchored at CTCF-binding repeats. The data for the other (non-HeLa) cell types has been uploaded to GEO.
Study
phs001010
RNA-seq data for proximal and distal human LHBT UZH (CH)
This dataset contains bulk RNA-seq data from 15 patients who underwent rotator cuff repair surgery, yielding paired proximal and distal samples of the long head of the biceps tendon (30 samples total). The submission includes 90 files, consisting of R1 and R2 FASTQ files for each sample as well asprocessed .TXT files containing gene-level counts generated after genome alignment. Raw sequencing reads were generated on an Illumina platform, aligned to the human reference genome GRCh38.p13, and quantified at the gene level.
Dataset
EGAD50000002095
Prevention of Viral Hepatitis and HIV in Drug Users - A Hepatitis B Model for HIV and an HB Vaccine Model for HIV Vaccine Trials in Drug Users (DASH)
The goal of the proposed study is to use the Hepatitis B virus (HBV) vaccine as a model for a future Human Immunodeficiency Virus (HIV) vaccine trial, examining the efficacy of community-based outreach intervention as well as an accelerated vaccine schedule as a method for increasing acceptance/adherence with HBV vaccination protocols among not-in-treatment drug users. This study also examined the effect of HBV vaccination coupled with community-based outreach intervention on reducing the incidence of HIV, HBV and Hepatitis C Virus (HCV) infections and the frequency of needle use and sexual risk behaviors related to these viral transmissions. A secondary purpose is to assess the antibody response after HBV vaccination as a measurement of immunological response in drug users.
Study
phs002331
MARS-seq dataset of five obese human subjects and a lean human subject
Biopsies from visceral adipose tissue from the omental depot (OAT) were obtained from five obese individuals and one lean donor with participant informed consent obtained after the nature and possible consequences of the studies were explained under protocols approved by the Institutional Review Boards of the Perelman School of Medicine at the University of Pennsylvania, the Children’s Hospital of Philadelphia, or the Tel Aviv Sourasky Medical Center. The obese donors underwent bariatric surgery, the lean donor underwent cholecystectomy. OAT samples were placed in 1 mL of DMEM, and finely minced under sterile conditions before digestion in 50 mL of DMEM with 3 mg/1 mL collagenase IV (Gibco). Samples were incubated at 37°C in a rotating oven for 20-60 min. Adipocyte and stromal vascular fractions (SVF) were separated by centrifugation, and red blood cells (RBCs) were removed from the SVF by histopaque gradient (Sigma). Single-cell RNA-sequencing libraries were prepared using the MARS-seq pipeline, and sequenced on the MiSeq 500 or HiSeq 2500 Sequencing System (Illumina).
Dataset
EGAD00001005100
National Institute of General Medical Sciences (NIGMS) Human Genetic Cell Repository Human Variation Panels including 100 African-Americans (HD100AA), 100 Caucasians (HD100CAU), 100 Han People of Los Angeles (HD100CHI) and 100 Mexican American Community of Los Angeles (HD100MEX)
The Human Genetic Cell Repository is sponsored by the National Institute of General Medical Sciences (NIGMS) with the mission of supplying scientists with the materials for accelerating disease gene discovery and functional studies. The resources available include highly-characterized, contaminant-free cell cultures and high quality, well-characterized DNA samples derived from these cultures, both subjected to rigorous quality control. The Repository was established in 1972 at Coriell and contains more than 9,500 cell lines, primarily fibroblasts and transformed lymphoblasts. The Repository has a major emphasis on inherited diseases and chromosomally aberrant cell lines. In addition, it contains a large collection dedicated to understanding human variation that includes samples from populations around the world, the CEPH collection, the Polymorphism Discovery Resource, Human Variation and many apparently healthy controls. The Human Variation collection provides cell lines and DNA samples from a variety of populations. The panels of African-Americans (HD100AA) and Caucasians (100CAU) used for this study are comprised of samples present in the Repository that were originally collected over the years from apparently healthy people to be used as "controls", for example, unaffected family members of persons with identified mono-genetic diseases. The samples for the Han people of Los Angeles (HD100CHI) and the Mexican American Community of Los Angeles (HD100MEX), however, were collected relatively recently from volunteers, identified as member of these communities, specifically for use in these panels. The Coriell Genotyping and Microarray Center in conjunction with the NIGMS repository used the Affymetrix Genome-Wide Human SNP 6.0 platform to genotype 400 samples from the NIGMS human variation panels. The populations genotyped included Americans of African, Caucasian, Mexican, and Han Chinese ancestry. The Affymetrix SNP 6.0 array detects approximately 940,000 SNPs and provides copy number information for more than 900,000 additional locations across the genome.
Study
phs000211
ALS Compute
Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disorder characterized by the progressive loss of brain and spinal cord motor neurons. Half of ALS patients display cognitive symptoms of frontotemporal dementia (FTD); reciprocally, about 40% of FTD patients show motor neuron deficits, and approximately 15% develop overt ALS. The clinical overlap between ALS and FTD means that the two conditions are thought to represent a disease spectrum (ALS/FTD). In recent years, the identification of several genetic causes of ALS/FTD has contributed significantly to our understanding of disease pathogenesis. Unfortunately, one-third of the underlying genetic causes of familial cases and ~90% of sporadic cases of ALS/FTD remain unexplained. As such, there is a dire need to identify additional genetic factors contributing to ALS/FTD. Such studies require huge cohorts of harmonized whole genome sequencing (WGS) data sets from cases and controls. Currently, there are several major ongoing sequencing efforts for ALS patients. Numerous centers lead to inefficiency, especially in terms of overall costs. At a minimum, this includes the cost of high-performance computing, the storage of large data files, and the duplication of effort between groups. This lack of data harmonization between groups precludes sharing of genetic information and weakens collaborative efforts. The cost and logistics are also a barrier to attracting talented investigators to the ALS/FTD field. To overcome this unmet need, we have founded the ALS Compute project. We are centralizing the storage of ALS/FTD WGS data from every significant sequencing effort in the United States and beyond within a single Cloud environment. This approach will facilitate data harmonization and improve accessibility to the data. To accomplish this, we have made the data and the computational infrastructure available via the Terra platform hosted by NHGRI's Genomic Data Science Analysis, Visualization, and Informatics Lab-Space (AnVIL). This will allow researchers worldwide to access this wealth of data, develop new theories of the disease, and yield breakthroughs in our understanding of ALS/FTD.
Study
phs003184
Whole-genome sequencing in 14 cases and whole-exome sequencing in 90 cases of Chinese ESCC
Esophageal cancer is one of the most aggressive cancers and the sixth leading cause of cancer death worldwide1. Approximately 70% of the global esophageal cancers occur in China and over 90% histopathological forms of this disease are esophageal squamous cell carcinoma (ESCC)2-3. Currently, there are limited clinical approaches for early diagnosis and treatment for ESCC, resulting in a 10% 5-year survival rate for the patients. Meanwhile, the full repertoire of genomic events leading to the pathogenesis of ESCC remains unclear. Here we show a comprehensive genomic analysis in 158 ESCC cases, as part of the International Cancer Genome Consortium (ICGC) Research Projects (http://icgc.org/icgc/cgp/72/371/1001734). We conducted whole-genome sequencing in 14 ESCC cases and whole-exome sequencing in 90 cases.
Study
EGAS00001001487
Determination of the molecular nature of the Vel blood group by exome sequencing
The genetic basis of the Vel group is yet to be determined. We have collected DNA samples from about 90 Vel negative blood donors by phenotyping 270,000 donors. Understanding the genetic basis of the Vel group could have clinical significance as transfusion of Vel positive blood to Vel negative recipients with anti-Vel can cause severe and acute haemolytic transfusion reactions.
Study
EGAS00001000069
Panel amplicon sequencing data of COVID-19 patients
Amplicon sequencing data for 90 patients hospitalized for COVID-19. to general ward. Patients had a median age of 60.5 (52.0 – 69.3) years and were overweighted (Body mass index: 28.4 (24.4 – 32.6) kg/m2). 35.6% of the cohort were female.
The following genes were sequenced on a NovaSeq600 instrument with an Enrichment based library preparation (IDT-xGEN) with a median coverage of 2000x:
ABL1, ASXL1, ATRX, BCOR, BCORL1, BRAF, CALR, CBL, CBLB, CBLC, CDKN2A, CEBPA, CSF3R, CUX1, DNMT3A, ETV6, EZH2, FBXW7, FLT3, FLT3-ITD, GATA1, GATA2, GNAS, GNB1, HRAS, IDH1, IDH2, IKZF1, JAK2, JAK3, KDM6A, KIT, KMT2A, KRAS, MPL, MYD88, NOTCH1, NPM1, NRAS, PDGFRA, PHF6, PPM1D, PTEN, PTPN11, RAD21, RUNX1, SETBP1, SF3B1, SMC1A, SMC3, SRSF2, STAG2, TET2, TP53, U2AF1, WT1, ZRSR2
Dataset
EGAD00001009416
Exploring the driver events of eccrine poromas and porocarcinomas: A retrospective, cross-institutional study of 90 cases
Eccrine poroma (EP) and porocarcinoma (EPC) are rare benign and malignant adnexal neoplasms of the terminal sweat gland duct, respectively. Both can arise de novo, however, EPCs can also arise from a pre-existing EP. To-date, genetic investigation of these tumors has involved studies with small sample sizes and/or limited analyses. To comprehensively compare the driver events and mutational landscape of these tumors, we performed a retrospective multi-institutional whole-exome sequencing and RNA sequencing study on the largest cohort of EPs and EPCs to-date (n=90). We uncovered novel events and delineated different pathways of tumorigenesis underlying these tumors, with EPs driven largely by fusion genes, and EPCs driven largely by somatic mutations, with rare YAP1 and frequent PAK gene novel fusions.
Dataset
EGAD00001015378
Exploring the driver events of eccrine poromas and porocarcinomas: A retrospective, cross-institutional study of 90 cases
Eccrine poroma (EP) and porocarcinoma (EPC) are rare benign and malignant adnexal neoplasms of the terminal sweat gland duct, respectively. Both can arise de novo, however, EPCs can also arise from a pre-existing EP. To-date, genetic investigation of these tumors has involved studies with small sample sizes and/or limited analyses. To comprehensively compare the driver events and mutational landscape of these tumors, we performed a retrospective multi-institutional whole-exome sequencing and RNA sequencing study on the largest cohort of EPs and EPCs to-date (n=90). We uncovered novel events and delineated different pathways of tumorigenesis underlying these tumors, with EPs driven largely by fusion genes, and EPCs driven largely by somatic mutations, with rare YAP1 and frequent PAK gene novel fusions.
Dataset
EGAD00001015376
Exploring the driver events of eccrine poromas and porocarcinomas: A retrospective, cross-institutional study of 90 cases
Eccrine poroma (EP) and porocarcinoma (EPC) are rare benign and malignant adnexal neoplasms of the terminal sweat gland duct, respectively. Both can arise de novo, however, EPCs can also arise from a pre-existing EP. To-date, genetic investigation of these tumors has involved studies with small sample sizes and/or limited analyses. To comprehensively compare the driver events and mutational landscape of these tumors, we performed a retrospective multi-institutional whole-exome sequencing and RNA sequencing study on the largest cohort of EPs and EPCs to-date (n=90). We uncovered novel events and delineated different pathways of tumorigenesis underlying these tumors, with EPs driven largely by fusion genes, and EPCs driven largely by somatic mutations, with rare YAP1 and frequent PAK gene novel fusions.
Dataset
EGAD00001015377
Exploring the driver events of eccrine poromas and porocarcinomas: A retrospective, cross-institutional study of 90 cases
Eccrine poroma (EP) and porocarcinoma (EPC) are rare benign and malignant adnexal neoplasms of the terminal sweat gland duct, respectively. Both can arise de novo, however, EPCs can also arise from a pre-existing EP. To-date, genetic investigation of these tumors has involved studies with small sample sizes and/or limited analyses. To comprehensively compare the driver events and mutational landscape of these tumors, we performed a retrospective multi-institutional whole-exome sequencing and RNA sequencing study on the largest cohort of EPs and EPCs to-date (n=90). We uncovered novel events and delineated different pathways of tumorigenesis underlying these tumors, with EPs driven largely by fusion genes, and EPCs driven largely by somatic mutations, with rare YAP1 and frequent PAK gene novel fusions.
Dataset
EGAD00001015379
The Medical Genome Reference Bank: a whole genome data resource of 4,000 healthy elderly individuals.
Allele frequency data from human reference populations is of increasing value for filtering and assignment of pathogenicity to genetic variants. Aged and healthy populations are more likely to be selectively depleted of pathogenic alleles, and therefore particularly suitable as a reference populations for the major diseases of clinical and public health importance. However, reference studies of the healthy elderly have remained under represented in human genetics. We have developed the Medical Genome Reference Bank (MGRB), a large scale comprehensive whole genome dataset of confirmed healthy elderly individuals, to provide a publicly accessible resource for health related research, and for clinical genetics. It also represents a useful resource for studying the genetics of healthy aging. The MGRB comprises 4,000 healthy, older individuals with no reported history of cancer, cardiovascular disease or dementia, recruited from two Australian community based cohorts. DNA derived from blood samples will be subject to whole genome sequencing. The MGRB will measure genome wide genetic variation in 4,000 individuals, mostly of European decent, aged 60 to 95 years (mean age 75 years). The MGRB has committed to a policy of data sharing, employing a hierarchical data management system to maintain participant privacy and confidentiality, whilst maximizing research and clinical usage of the database. The MGRB will represent a dataset of international significance, broadly accessible to the clinical and genetic research community.
Study
EGAS00001003511
Data access committee for study - The subclonal architecture of metastatic breast cancer: Results from a prospective community-based rapid autopsy program "CASCADE".
Dac
EGAC00001000574
The Oral Microbiome and Head and Neck Cancer
Oral microbiota may influence head and neck squamous cell carcinoma (HNSCC) development, potentially related to carcinogen metabolism. The human oral cavity hosts a diverse microbiota, including bacteria and fungi. We performed shotgun sequencing and ITS1 sequencing on 236 HNSCC case participants who developed HNSCC during a mean follow-up of 5.1 years and 458 matched controls who remained HNSCC-free. Oral samples were obtained from a prospective nested case-control study within three epidemiological cohorts: the American Cancer Society Cancer Prevention Study II Nutrition Cohort (ACS CPS-II), the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO), and the Southern Community Cohort Study (SCCS). Control participants were selected using 2:1 frequency matching based on cohort, age, sex, race and ethnicity, and time since oral sample collection.
Study
phs004018
RNA sequencing data from visceral and abdominal subcutaneous adipose tissue from morbidly obese women with normal glucose tolerance or type 2 diabetes
The study group consisted of 17 obese women with normal glucose tolerance and 15 obese women with T2DM. Adipose tissue specimens were taken from the epigastric region of the abdominal wall (SAT) and from the major omentum (VAT). RNA was isolated and RNA sequencing was used to analyse the transcriptome. Dharuri H et al, Diabetologia. 2014;57(11):2384-92.
Study
EGAS00001001872
Insights into Adult Gut Microbiota Composition Using the Estonian Cohort
This study explores the gut microbiota of 1,826 adult individuals, predominantly from the Estonian population. The cohort includes 16S rRNA gene V4 region amplicon sequencing data along with metadata, such as age, sex, health status, anthropometric parameters, and stool consistency. The research aims to characterise the microbial community structure, identify distinct enterotypes, and examine their associations with host characteristics. The dataset provides a valuable reference for understanding gut microbiota composition and supports research in personalised nutrition and lifestyle-related health strategies.
Study
EGAS50000001611
EXCEED Study
EXCEED is a longitudinal population-based cohort which facilitates investigation of genetic, environmental and lifestyle-related determinants of a broad range of diseases and of multiple morbidity through data collected at baseline and via electronic healthcare record linkage. Recruitment has taken place in Leicester, Leicestershire and Rutland since 2013 and is ongoing, with approximately 10,000 participants aged 30-69 to date. The population of Leicester is diverse and additional recruitment from the local South Asian community is ongoing.
Study
EGAS00001003499
Saliva_Fulani_Database
In total, 90 DNA samples were genotyped on the Illumina Infinium H3Africa Consortium array (designed for 2,271,503 SNPs; using BeadChip type: H3Africa_2019_20037295_B1). Genotyping was performed at the SNP&SEQ Technology Platform (NGI/SciLifeLab Genomics, Sweden). Saliva Fulani dataset includes individuals from the following populations: 30 individuals from Senegal_FulaniLinguere, 30 individuals from Mauritania_FulaniAssaba, and 30 individuals from Mali_FulaniInnerDelta.
Dataset
EGAD50000000653
bulk RNA-Seq samples of CRC patients
The dataset contains samples of 30 CRC patients (3 samples for each patient, tumor and 2 normal adjacent tissue sites, 90 samples in total).
Dataset is composed by fastq file (paired end) type from bulk RNA-Seq.
Dataset
EGAD00001009635
Exome_sequencing_of_Congenital_Heart_Disease_families_Leuven
This project aims to study at least 90 exomes from families with congenital heart disease. The samples have been selected in Leuven in collaboration with Koen Devriendt. Ethic approval has been sought for in Leuven, Belgium and a HDMMC agreement for submitting these samples is in place at the WTSI. The phenotype we wil primarily focus our analysis is severe Left Ventricular Outflow Tract Obstructions (LVOTO) and Atrioventricular Septal Defect (AVSD). The indexed Agilent whole exome pulldown libraries will be sequenced on 75bp PE HiSeq (Illumina).
Study
EGAS00001000185
NHLBI TOPMed: Whole Genome Sequencing of Venous Thromboembolism (WGS of VTE)
This study consists of 338 VTE cases from an inception cohort of Olmsted County, MN residents (OC) with a first lifetime objectively-diagnosed idiopathic VTE during the 40-year study period, 1966-2005. All living study subjects were invited to provide a whole blood sample at the Mayo Clinical Research Unit for leukocyte genomic DNA and plasma collection. For living study subjects who did not provide a blood sample, we retrieved any leftover blood ("waste" blood) from samples collected as part of routine clinical diagnostic testing and used this to extract DNA after obtaining patient consent. For deceased cases, with IRB approval, we extracted DNA from any available stored tissue within the Mayo Tissue Archive. This "tissue" DNA has been successfully genotyped in prior studies. Three trained and experienced study nurse abstractors reviewed the complete medical records in the community of all potential cases. Note: WGS sample IDs for the previous GENEVA study cases (phs000289) are included in this dataset. The phenotypes for the GENEVA study are located under the above phs number.
Study
phs001402
Genomic_Advances_in_Sepsis__GAinS__genotyping
This cohort comprises a subset of pateints enrolled in the Genomic Advances in Sepsis (GAinS) study, an established biobank of adult sepsis patients. Patients with sepsis due to community acquired pneumonia or faecal peritonitis were recruited from 35 hospitals across the UK from 2005-2018, with samples for functional genomics and detailed clinical information collected over the first five days of ICU admission. DNA was extracted from buffy coat or whole blood samples using the Qiagen DNA extraction protocol, the automated Maxwell Blood purification kit (Promega), or the QIAamp Blood Midi kit protocol (Qiagen). Genotyping data were generated using the Illumina HumanOmniExpress BeadChip (295 patients), the Infinium CoreExome BeadChip (655 patients), and the Infinium Global Screening Array BeadChip (307 patients). Genotyping QC and imputation into the Haplotype Reference Consortium was perfomed within each batch. The datasets were combined and following post-imputation filtering data were available on 1168 samples. This data is part of a pre-publication release. For information on the proper use of pre-publication data shared by the Wellcome Trust Sanger Institute (including details of any publication moratoria), please see http://www.sanger.ac.uk/datasharing/
Study
EGAS00001007786
The National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS) Consortium for the Longitudinal Evaluation of African-Americans with Early Rheumatoid Arthritis (CLEAR)
The Consortium for the Longitudinal Evaluation of African-Americans with Early Rheumatoid Arthritis (CLEAR) Registry and Repository, supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS), was established in 2000 to provide a resource for the scientific community to explore genetic and non-genetic factors affecting rheumatoid arthritis (RA) occurrence and outcomes in African Americans. The long term objective is a database and a repository of 1,100 RA and 550 matched healthy African-American subjects. This CLEAR Registry and Repository has two arms: a longitudinal arm for subjects with early RA (enrollment from 2000 to 2005) and a cross-sectional arm for subjects with any disease duration (enrollment from 2006 to 2012). CLEAR has two components: a database and a repository. The database contains extensive demographic, socioeconomic, clinical and radiographic (radiographs of hands and feet) information and bone mineral density data from DEXA scans. The repository contains genomic DNA, plasma and serum on most of the participants. Participants in CLEAR II had RNA isolated from peripheral blood cells.
Study
phs001360
A Genome-Wide Association Study on Cataract and HDL in the Personalized Medicine Research Project Cohort
The primary goals of this project are to develop and validate electronic phenotyping algorithms, to accurately identify cases and controls while maintaining a positive predictive value (PPV) of >95%, and to conduct a genome wide association study that advances understanding of two specific yet interrelated disease states, while simultaneously engaging the community in these research efforts. Lipid abnormalities and cataracts are both diseases of public health significance, they share common risk factors, and they are both complex diseases which likely have many genes contributing to disease development. Whole genome association studies with these two outcomes and environmental risk factors could yield novel data about the etiology of the two separate outcomes as well as their interaction. PhenX is a project designed to prioritize Phenotypes and eXposure measures for Genome-wide Association Studies (GWAS). The PhenX Toolkit is a valuable resource for researchers who are planning or expanding a study and would like to incorporate well established measures that have been recommended by experts in the field. We are currently interested in gene-environment interactions for "age related cataract".
Study
phs000170
HCC cfMeDIP-seq
This dataset comprises 406 raw fastq files derived from 203 plasma cfDNA samples from 90 patients diagnosed with hepatocellular carcinoma (HCC). The samples include 90 baseline HCC (b-HCC) samples, collected during liver transplant or resection surgeries, and 113 postoperative follow-up (f-HCC) samples. For each sample, 10ng of cfDNA was used for library preparation utilizing cfMeDIP-seq technology based on the 5mC antibody-immunoprecipitation strategy. Libraries were validated via Bioanalyzer trace analysis and sequenced on Illumina NovaSeq 6000 or HiSeq 2500 platform with paired-end 150-bp (NovaSeq) or 125-bp (HiSeq) model for ~100 million reads per sample.
Dataset
EGAD50000000651
APCDR Uganda GWAS: Genome-wide sequence variation and susceptibility loci for cardiometabolic traits in a sub-Saharan African population (UG2G component)
Genomic studies in African populations provide unique opportunities to understand disease aetiology, human genetic diversity and population history in a regional and a global context. To leverage the relative benefits of different strategies, we undertook a combined approach of genotyping and whole-genome sequencing (WGS) in a population-based study of 6,400 individuals from a geographically defined rural community in South-West Uganda. We present data from 4,778 individuals with genotypes for ~2.2 million SNPs from the Uganda GWAS resource (UGWAS), and sequence data on up to 1,978 individuals spanning 41.5M SNPs and 4.5M indels (UG2G); 343 individuals overlap between the two datasets.
We highlight the value of the largest sequence panel from Africa to date as a global resource for variant discovery, imputation and understanding the mutational spectrum and its clinical relevance in African populations. Alongside phenotype data, we provide a rich new genomic resource for researchers in Africa and globally.
Study
EGAS00001000545
NHLBI GO-ESP: Early-Onset Myocardial Infarction (Broad EOMI)
The NHLBI "Grand Opportunity" Exome Sequencing Project (GO-ESP), a signature project of the NHLBI Recovery Act investment, was designed to identify genetic variants in coding regions (exons) of the human genome (the "exome") that are associated with heart, lung and blood diseases. These and related diseases that are of high impact to public health and individuals from diverse racial and ethnic groups will be studied. These data may help researchers understand the causes of disease, contributing to better ways to prevent, diagnose, and treat diseases, as well as determine whether to tailor prevention and treatments to specific populations. This could lead to more effective treatments and reduce the likelihood of side effects. GO-ESP is comprised of five collaborative components: 3 cohort consortia - HeartGO, LungGO, and WHISP - and 2 sequencing centers - BroadGO and SeattleGO. In the Grand Opportunities Exome Sequencing Program Early MI Project (GO ESP - EOMI), we are sequencing cases with extremely early-onset MI drawn from 8 cohorts. These cohorts include five hospital or community-based studies that ascertained individuals based on MI status. These include PennCATH, Cleveland Clinic Genebank, Massachusetts General Hospital Premature Coronary Artery Disease Study (MGH-PCAD), Heart Attack Risk in Puget Sound (HARPS), and Translational Research Investigating Underlying Disparities in Myocardial Infarction Patients' Health Status (TRIUMPH). Cases were selected based on MI occurring in men aged ≤50 years and women aged ≤60 years. In addition, early-MI cases are being drawn from three population-cohort studies including the Framingham Heart Study, the Women's Health Initiative, and the Atherosclerosis Risk in Communities Study. MI-free controls are being drawn from five population-based cohort studies including the Framingham Heart Study, the Women's Health Initiative, Atherosclerosis Risk in Communities Study, Cardiovascular Health Study, and the Jackson Heart Study. Controls were selected based on two factors: (1) highest predicted risk for MI based on Framingham risk score; and (2) absence of prevalent or incident MI despite a high predicted risk.
Study
phs000279
Genome-Wide Pleiotropy Scan Across Multiple Cancers
A whole-exome sequencing (WES) study was conducted in 3,233 cases diagnosed with multiple primary cancers and 3,229 matched cancer-free controls (90% non-Hispanic white, 3% African-American, 3% East Asian, and 4% Latino) selected from individuals in the Kaiser Permanente Research Bank (KPRB) who were members of the Kaiser Permanente Northern California (KPNC) health plan. Cancer-free controls were matched to cases on age at specimen collection (within 2 years), sex, genotyping array (which matched on self-reported race/ethnicity), closest distance using the first two principal components for genetic ancestry, and reagent kit. Cases and controls were drawn from two prospective KPRB cohorts: the Research Program on Genes, Environment and Health (RPGEH) and the ProHealth study. Participants were sequenced by the Regeneron Genetics Center using the Illumina NovaSeq 6000 platform, and sample preparation and quality control were performed using a high-throughput, fully-automated system [PMID: 33087929]. Reads were aligned to the GRCh38 reference genome, and variants were called using WeCall [PMID: 33087929]. Participants with sex discordance, 20x coverage at less than 80% of targeted sites, and/or contamination greater than 5% were excluded. After quality control, we retained n = 6,247 (3,111 cases, 3,136 controls) individuals for downstream analyses. Among participants selected for this WES study, n = 5,432 (2,299 cases; 3,133 controls) consented to deposition of data to the National Institutes of Health (NIH).Further quality control was applied to filter low quality variants. Genotype calls with low depth of coverage (DP) were updated to missing (DP < 7 for SNPs and DP < 10 for indels), after which sites with low allele balance (AB) - variants without at least one sample having AB ≥ 15% for SNPs or AB ≥ 20% for indels - were removed. Lastly, variants with missingness > 10% and Hardy-Weinberg equilibrium p-value < 10-15 were excluded. Further description of quality control and downstream single-variant and gene-based analyses is available in Cavazos et al, 2022 [medRxiv].
Study
phs002809
VESPA: Vanderbilt Electronic Systems for Pharmacogenomic Assessment
The Vanderbilt Electronic Systems for Pharmacogenomic Assessment (VESPA) Project is a large electronic medical record (EMR)- and biobank-based initiative for translational pharmacogenomics discoveries. Key research resources utilized in this effort include the BioVU DNA databank and associated Synthetic Derivative database of clinical information, and software tools developed to identify drugs and clinical events using EHR-derived structured and unstructured ("free text") data. Cohorts include subjects with primarily drug-response phenotypes, and most cases and controls identified include three data types: ICD-9 codes, medication regimens, and medical test results. Advanced informatics methods, such as natural language processing, were also used for some phenotypes. Algorithms used event-sequence analyses to establish temporal relationships between drugs and phenotypes. Both cases and control algorithms excluded records that contained specific comorbidities and were refined to achieve positive predictive value (PPV) > 90%. For automated algorithms failing to meet this threshold, manual review was coupled with algorithms to validate that the included cases were true positives. A total of 11,639 subjects met phenotyping criteria for a least one of the 28 phenotypes investigated. Across all phenotypes cases and controls, 90% were reused as either a case or control for a least one other phenotype. Data are deposited for individual phenotypes in the VESPA substudies. Substudies: VESPA: Genome Wide Association Study of Serum Creatinine during Vancomycin Therapy and Vancomycin Pharmacokinetics - phs000894 VESPA: Genome Wide Study of Clinically Diagnosed Ace Inhibitor Cough - phs000992
Study
phs000991
The Federated EGA network
In the last 10 years, most human omics data has been generated in the context
of research consortia while recently there has been an emergence of large
cohorts of human data generated by healthcare initiatives. Many countries in
Europe now have nascent personalised medicine programmes. Thus, human genomics
is shifting from being predominantly research-driven to being funded through
healthcare systems. Genetic data generated in a healthcare context is subject
to more stringent information governance than research data and nonetheless
all data must adhere to national data protections laws. In this context, EGA
identified the need for the development of a federated network to enable
secure sharing of data whilst enabling genetic data to remain within the
jurisdiction in which it was generated. The Federated EGA is designed to
support national data management requirements for genomic and clinical data
collected from citizens as part of healthcare or biomedical research projects.
It includes a secure authorised access mechanism to support research use of
these data across Europe and worldwide.
We have engaged with over 14 ELIXIR countries to develop the federation model.
The EGA federated configuration is composed of Central EGA, Federated EGA
nodes and Community EGA nodes:
Central EGA offers international submissions and helpdesk
support, currently EGA co-managed by EMBL-EBI and CRG,
Federated EGA nodes offer EGA services to researchers
within their national jurisdiction,
Community EGA nodes are individual institutions or
initiatives with human genetic data intended to be shared with the research
community.
Key services available in the Federated EGA structure are:
Central EGA
Federated EGA node
Community EGA nodes
Data submission
Offers international submissions service
Offers submission service in a particular jurisdiction
Does not offer an external submissions service
Helpdesk support
Provides external international helpdesk
Provides helpdesk support for submitters in its jurisdiction and for
approved users of data managed at its facilities
Provides helpdesk support only for approved users of data managed at
its facilities
Data distribution
Manages worldwide distribution for data hosted at Central EGA
Manages worldwide distribution for data hosted at Federated EGA
node
Distribution for data hosted at Community EGA node
EMBL-EBI and CRG have prepared a
series of documents
describing the overall governance and coordination framework. These describe
the Federated EGA structure, the rights and responsibilities of the different
parties and governing committees, and node operation guidelines. The
federation governance proposal is under review by the first countries invited
to join as Federated nodes, with further interested countries likely to join
in the coming years. The Central EGA team has been working closely with many
ELIXIR partners, including ELIXIR Finland, Luxembourg, Germany, Norway, Spain
and Sweden.
Blog
the-federated-ega-network
dbGaP Collection: Compilation of Individual-Level Genomic Data for General Research Use
This dbGaP Collection contains all authorized individual-level genomic datasets currently in dbGaP that are approved for General Research Use (GRU) and have no further limitations beyond those outlined in the model Data Use Certification Agreement. Access to this study will include any additional authorized individual-level GRU datasets that become available. Renewal of this study is required annually. Data Included in this Study Individual-level genomic data designated for general research use with no further use limitations or restrictions, that is, Data use does not require approval by an Institutional Review Board for secondary analyses Data have no publication embargo Data have no other use limitations (e.g., requirement for collaboration or publication; restricted for use by academic or not-for-profit organizations; restricted to health, medical, and/or biomedical research) In response to requests from the scientific community, the NIH implemented a change in the procedures for accessing individual-level GRU genomic data. Under the modified procedures, interested investigators can request GRU individual-level genomic datasets with a single application. However, please note that these data have not been harmonized. Additionally, to help expedite the processing of requests for individual-level GRU genomic datasets, the requests will be reviewed by a single, central Data Access Committee. The process for requesting these datasets is identical to the request process for non-GRU individual-level, controlled-access data, such as the expectation that investigators will abide by the NIH Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies and Data Use Certification Agreement. Effective April 24, 2018, the Data Use Limitations (DUL) for the Genes and Blood Clotting Study (GABC) (phs000304) was updated to Health/Medical/Biomedical, Not-for-profit Use Only (HMB-NPU). Additionally, the submitting institutions have requested the removal of the National Heart, Lung, and Blood Institute (NHLBI) "Grand Opportunity" Exome Sequencing Project (GO-ESP) Lung Cohorts Exome Sequencing Project: Genetic modifiers of Pseudomonas aeruginosa (Pa) lung infection acquisition in cystic fibrosis (phs000254) from the dbGaP Compilation of Individual-Level Genomic Data for General Research Use (dbGaP Compilation) (phs000688). As such, data from studies phs000304 and phs000254 are no longer available as part of the dbGaP Compilation (phs000688). Investigators must submit separate Data Access Requests (DARs) for the two studies and be approved by the NHLBI Data Access Committee.
Study
phs000688
Comprehensive characterization of cell-free tumor DNA in plasma and urine of patients with renal tumors
Cell-free tumour-derived DNA (ctDNA) allows non-invasive monitoring of various cancers but its utility in renal cell cancer (RCC) has not been well established. A combination of untargeted and targeted sequencing methods, applied to two independent cohorts of patients (n=90) with various renal tumour subtypes, were used to determine ctDNA content in plasma and urine. Our data revealed lower plasma ctDNA levels in RCC relative to other cancers, with untargeted detection of ~33% for both cohorts. A highly sensitive personalised approach, applied to plasma and urine from select patients (n=22), improved detection to ~50%, including in patients with early stage and even benign lesions. A machine-learning based model, applied to untargeted data, predicted this detection, potentially offering a means of triaging patient samples for personalised analysis. We observed that plasma, and for the first time, urine ctDNA may better represent tumour heterogeneity than a single tissue biopsy. Longitudinal sampling of >200 plasma samples revealed that ctDNA can track disease course. These data highlight low ctDNA levels in RCC but indicate potential clinical utility provided improvement in isolation and detection approaches.
Study
EGAS00001003530