RNASeq files for paper titled "Fusion oncoproteins and cooperating mutations define disease phenotypes in NUP98-rearranged leukemia" PMID: 39974131, PMCID: PMC11838931, DOI: 10.1101/2025.01.21.25320683
scRNASeq files for paper titled "Fusion oncoproteins and cooperating mutations define disease phenotypes in NUP98-rearranged leukemia" PMID: 39974131, PMCID: PMC11838931, DOI: 10.1101/2025.01.21.25320683
Shallow-whole genome sequencing for copy numbers in resectable gastric cancer treated with surgery alone
Targeted sequencing using SPET for Mesothelioma.
Whole exome sequencing data for 18 mucoepidermoid carcinoma samples. The samples were used for Illumina TruSeq library construction and captured using Agilent V4 exome panel. The PE fastq files are provided.
This data set contains for 217 Egyptian individuals the amplicon sequencing reads mapping to chrM. These were subsequently used for haplogroup assignment.
RNAseq fastq files for 254 samples for the neoALTTO study of lapatinib, trastuzumab or combination in HER2+ breast cancer patients. Those are pre-treatment baseline samples.
The goal of this study was two-fold, to determine if common purification techniques have any effect on downstream differential expression analysis and to evaluate combinations of alignment and differential expression software for reliability. To this end, blood was collected from three individuals and pooled during and after extraction. After pooling and mixing was completed, samples were divided into three aliquots for testing. The first aliquot was a control and not purified or concentrated. It was diluted to produce samples of varying concentrations for testing. The second aliquot was diluted to 20 ng/μL and used to test six different variations on the AMPure XP bead purification procedure. The last aliquot was also diluted to varying concentrations of 60, 30, and 9.6 ng/μL and purified using MinElute columns. Samples were submitted for total RNA-Seq library preparation and sequencing. Library preparation was performed using the TruSeq Stranded Total RNA with Ribo-Zero Globin kit (20020612, Illumina Inc.), and 2x150 bp PE sequencing was done on an Illumina NovaSeq 6000 with the S4 reagent kit. Comparisons were made between methods (AMPure vs. unpurified, AMPure vs. MinElute, MinElute vs. unpurified) to assess the effects of purification method on downstream differential expression. Comparisons were also made within methods using the varying concentrations tested for unpurified samples and for MinElute purified samples to assess the effects of concentration on differential expression. Variations of the AMPure procedure were also compared to assess the effectiveness of the variations tested in comparison to the unmodified procedure. A subset of samples was selected for use with alignment and differential expression package comparison. Unpurified high concentration samples eluted in RNAse-free water were compared to unpurified high concentration samples eluted in BR5, a buffer from the PAXgene blood miRNA kit, with the expectation that there should be no or very few differentially expressed genes. Unpurified high concentration samples eluted in RNAse-free water were also compared to unpurified low concentration samples also eluted in RNAse-free water, with possibly a small number of differentially expressed genes anticipated. A third dataset of simulated RNA-seq data was created with a known rate of differential expression. These files were aligned using Bowtie2, HISAT2, kallisto, RSEM, Rsubread, Salmon, and STAR. Results were then analyzed for differential expression using ALDEx2, baySeq, DEGseq, DESeq2, edgeR, limma, NOISeq, PoissonSeq, and SAMseq. Differential expression results of all three comparisons were evaluated to determine which combinations provided the most reliable results for both real and simulated data.
The Genetics and Epidemiology of Colorectal Cancer Consortium (GECCO) is a collaborative effort comprised of a coordinating center and scientific researchers from well-characterized cohort and case-control studies. This international consortium aims to accelerate the discovery of common and rare genetic risk variants for colorectal cancer by conducting large-scale meta-analyses of existing and newly generated genome-wide association study (GWAS) data, whole genome sequencing, replicating and fine-mapping of genetic discoveries, and investigating how genetic risk variants are modified by environmental risk factors. To expand these efforts, we assembled case-control sets or nested case-control sets from 6 different North American or European studies. Summary descriptions and study participant inclusions/exclusion criteria for each of these studies are detailed below. Cancer Prevention Study II (CPS II): The CPS II Nutrition cohort is a prospective study of cancer incidence and mortality in the United States, established in 1992 and described in detail elsewhere (Calle et al., 2002 PMID:12015775; Campbell et al., 2014 PMID:25472679). At enrollment, participants completed a mailed self-administered questionnaire including information on demographic, medical, diet, and lifestyle factors. Follow-up questionnaires to update exposure information and to ascertain newly diagnosed cancers were sent biennially starting in 1997. Reported cancers were verified through medical records, state cancer registry linkage, or death certificates. The Emory University Institutional Review Board approves all aspects of the CPS II Nutrition Cohort. We restricted to samples that had blood DNA source. Controls were matched to cases in a case/control ratio of 2:1 on reference year and sex. Darmkrebs: Chancen der Verhütung durch Screening (DACHS): This German study was initiated as a large population-based case-control study in 2003 in the Rhine-Neckar-Odenwald region (southwest region of Germany) to assess the potential of endoscopic screening for reduction of colorectal cancer risk and to investigate etiologic determinants of disease, particularly lifestyle/environmental factors and genetic factors. Cases with a first diagnosis of invasive colorectal cancer (International Classification of Diseases 10 codes C18-C20) who were at least 30 years of age (no upper age limit), German speaking, a resident in the study region, and mentally and physically able to participate in a one-hour interview, were recruited by their treating physicians either in the hospital a few days after surgery, or by mail after discharge from the hospital. Cases were confirmed based on histologic reports and hospital discharge letters following diagnosis of colorectal cancer. All hospitals treating colorectal cancer patients in the study region participated. Based on estimates from population-based cancer registries, more than 50% of all potentially eligible patients with incident colorectal cancer in the study region were included. Community-based controls were randomly selected from population registries, employing frequency matching with respect to age (5-year groups), sex, and county of residence. Controls with a history of colorectal cancer were excluded. Controls were contacted by mail and follow-up calls. The participation rate was 51%. During an in-person interview, data were collected on demographics, medical history, family history of CRC, and various life-style factors, as were blood and mouthwash samples. Routine formalin-fixed, paraffin-embedded (FFPE) tumor samples from the patients enrolled were requested from the pathology institutes and used for tumor tissue analyses. This analysis includes participants with blood source DNA that were recruited up to 2010 in this ongoing study. Controls were matched to cases on reference age and sex in a case/control ratio of 2:1. Health Professionals Follow-up Study (HPFS): A parallel prospective study to the NHS (Nurses' Health Study). The HPFS cohort comprised 51,529 men aged 40-75 who, in 1986, responded to a mailed questionnaire (Rimm et al., 1990 PMID:2090285). Participants provided information on health related exposures, including current and past smoking history, age, weight, height, diet, physical activity, aspirin use, and family history of colorectal cancer. Colorectal cancer and other outcomes were reported by participants or next-of-kin and were followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical record review. Information was abstracted on histology and primary location. Incident cases were defined as those occurring after the subject provided the blood sample. Prevalent cases were defined as those occurring after enrollment in the study but before the subject provided the blood sample. Follow-up evaluation has been excellent, with 94% of the men responding to date. Colorectal cancer cases were ascertained through January 1, 2008. In 1993-1995, 18,825 men in the HPFS mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 13,956 men in the HPFS who had not provided a blood sample previously mailed in a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1986, but before the subject provided either a blood or buccal sample. Participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were excluded. Control participants were required to be free of invasive colorectal cancer and non-invasive (stage 0 in situ) colorectal cancer. For this study, only European ancestry participants with blood source DNA and incident colorectal cancer cases were eligible for selection. Since enrollment year and sex matched exactly, controls were randomly selected in a case/control ratio of 2:1. Nurses Health Study (NHS): The NHS cohort began in 1976 when 121,700 married female registered nurses age 30-55 years returned the initial questionnaire that ascertained a variety of important health-related exposures (Belanger et al., 1978 PMID:248266). Since 1976, follow-up questionnaires have been mailed every 2 years. Colorectal cancer and other outcomes were reported by participants or next-of-kin and followed up through review of the medical and pathology record by physicians. Overall, more than 97% of self-reported colorectal cancers were confirmed by medical-record review. Information was abstracted on histology and primary location. The rate of follow-up evaluation has been high: as a proportion of the total possible follow-up time, follow-up evaluation has been more than 92%. Colorectal cancer cases were ascertained through June 1, 2008. In 1989-1990, 32,826 women in NHS I mailed blood samples by overnight courier, which were aliquoted into buffy coat and stored in liquid nitrogen. In 2001-2004, 29,684 women in NHS I who did not previously provide a blood sample mailed a swish-and-spit sample of buccal cells. Incident cases were defined as those occurring after the subject provided a blood or buccal sample. Prevalent cases were defined as those occurring after enrollment in the study in 1976 but before the subject provided either a blood or buccal sample. Participants with histories of cancer (except nonmelanoma skin cancer), ulcerative colitis, or familial polyposis, case-control sets were excluded. For this study, only European ancestry participants with blood source DNA and incident colorectal cancer cases were eligible for selection. Since enrollment year and sex matched exactly, controls were randomly selected in a case/control ratio of 2:1. Prostate, Lung, Colorectal and Ovarian Cancer Screening Trail (PLCO): PLCO enrolled 154,934 participants (men and women, aged between 55 and 74 years) at ten centers into a large, randomized, two-arm trial to determine the effectiveness of screening to reduce cancer mortality. Sequential blood samples were collected from participants assigned to the screening arm. Participation was 93% at the baseline blood draw. White colorectal cancer cases with a family history of colorectal cancer (no history of ulcerative colitis, Crohn's Disease, diverticulitis, Gardner's syndrome, Familial Polyposis) and successful genotyping from previous Peters GWAS were selected for this project. Controls were matched to cases on reference age and sex in a case/control ratio of 2:1. Women's Health Initiative (WHI): WHI is a long-term national health study that has focused on strategies for preventing heart disease, breast and colorectal cancer, and osteoporotic fractures in postmenopausal women. The original WHI study included 161,808 postmenopausal women enrolled between 1993 and 1998. The Fred Hutchinson Cancer Research Center in Seattle, WA serves as the WHI Clinical Coordinating Center for data collection, management, and analysis of the WHI. The WHI has two major parts: a partial factorial randomized Clinical Trial (CT) and an Observational Study (OS); both were conducted at 40 Clinical Centers nationwide. The CT enrolled 68,132 postmenopausal women between the ages of 50-79 into trials testing three prevention strategies. If eligible, women could choose to enroll in one, two, or all three of the trial components. The components are: Hormone Therapy Trials (HT): This double-blind component examined the effects of combined hormones or estrogen alone on the prevention of coronary heart disease and osteoporotic fractures, and associated risk for breast cancer. Women participating in this component with an intact uterus were randomized to estrogen plus progestin (conjugated equine estrogens [CEE], 0.625 mg/d plus medroxyprogesterone acetate [MPA] 2.5 mg/d) or a matching placebo. Women with prior hysterectomy were randomized to CEE or placebo. Both trials were stopped early, in July 2002 and March 2004, respectively, based on adverse effects. All HT participants continued to be followed without intervention until close-out. Dietary Modification Trial (DM): The Dietary Modification component evaluated the effect of a low-fat and high fruit, vegetable and grain diet on the prevention of breast and colorectal cancers and coronary heart disease. Study participants were randomized to either their usual eating pattern or a low-fat dietary pattern. Calcium/Vitamin D Trial (CaD): This double-blind component began 1 to 2 years after a woman joined one or both of the other clinical trial components. It evaluated the effect of calcium and vitamin D supplementation on the prevention of osteoporotic fractures and colorectal cancer. Women in this component were randomized to calcium (1000 mg/d) and vitamin D (400 IU/d) supplements or a matching placebo. The Observational Study (OS) examines the relationship between lifestyle, environmental, medical and molecular risk factors and specific measures of health or disease outcomes. This component involves tracking the medical history and health habits of 93,676 women not participating in the CT. Recruitment for the observational study was completed in 1998 and participants were followed annually for 8 to 12 years. All centrally confirmed White cases of invasive colorectal cancer, or death from colorectal cancer were selected as potential cases from the March, 2011 database. Case priory lists are: 1) have positive family history of colorectal cancer; 2) randomly select cases until we get a total of n=800 cases. Control participants were required to be White, free of invasive colorectal cancer and non-invasive (stage 0 in situ) colorectal cancer. Centrally denied cases of colorectal cancer were not allowed into the control pool. Case and control participants were subject to the following exclusion criteria: (1) had prior history of colorectal cancer at baseline; (2) had no available DNA (DNA searching as Nov 15, 2012); (3) cannot be deposited to dbGaP; (4) lost to follow-up after enrollment; (5) selected for WHI study M26 Phase II. Controls were matched to cases in a case/control ratio of 2:1. In order to get 2 cases with 1 control, cases were grouped by enrollment year (a total of 5 groups). For each year group, around 50% cases were selected to match controls. In total, 401 cases were selected to match controls. Matching was done on enrollment year, which was matched exactly. For additional information, see dbGaP: phs000200 and ClinicalTrials: NCT00000611.
Two multiplex for alcohol dependence family studies were initiated based on the gender of a pair of probands. Ascertainment of a pair of AD probands was used to increase the density of alcohol dependence within the family. Identification of male (Cognitive and Personality Factors in Relatives of Alcoholics Family Study) and female (Biological Risk Factors in Female Alcoholics Family Study) alcohol dependent probands was accomplished by identifying an alcohol dependent proband while in treatment in a substance abuse treatment facility in Pittsburgh and surrounding communities who provided contact information for a sibling whom he/she thought might be interested in participating. The studies had identical ascertainment requirements. DNA was banked for all willing participants.