The raw fastq files target sequencing of 112 genes for 1,298 endometrial glands and matched blood samples. The paired-end sequencing data sets (R1 and R2) are deposited. ABCC1, ACRC, ANK3, ARHGAP35, ARID1A, ARID5B, ATCAY, ATM, ATR, BARD1, BCOR, BRCA1, BRCA2, BRD4, BRIP1, CAMTA1, CDC23, CDYL, CFAP54, CHD4, CHEK1, CHEK2, CTCF, CTNNB1, CUX1, DGKA, DISP2, DYNC2H1, EMSY, FAAP24, FAM135B, FAM175A, FAM65C, FANCA, FANCB, FANCC, FANCD2, FANCE, FANCF, FANCG, FANCI, FANCL, FANCM, FAT1, FAT3, FBN2, FBXW7, FGFR2, FRG1, GPR50, HEATR1, HIST1H4B, HNRNPCL1, HOOK3, KIAA1109, KIF26A, KMT2B, KMT2C, KRAS, LAMA2, LRP1B, MLH1, MON2, MRE11A, MSH2, MSH6, MTOR, NBN, PALB2, PHEX, PIK3CA, PIK3R1, PLXNB2, PLXND1, PMS2, POLE, POLR3B, PPP2R1A, PTEN, PTPN13, RAD50, RAD51, RAD51B, RAD51C, RAD51D, RAD52, RAD54B, RAD54L, RICTOR, SACS, SIGLEC9, SLC19A1, SLX4, SPEG, STT3A, TAF1, TAF2, TAS2R31, TFAP2C, TNC, TONSL, TP53, TTC6, UBA7, VNN1, WT1, XIRP2, ZBED6, ZC3H13, ZFHX3, ZFHX4, ZMYM4.
The gut microbiota composition is unique to every individual but is shaped by common factors including diet, lifestyle, medication use, early-life determinants, living environment or genetics. Most of these factors may be influenced by ethnicity. This study explored variations in fecal microbiota composition in 6048 individuals with different ethnic backgrounds living in the same geographical area (Amsterdam, the Netherlands). The HELIUS data are owned by the Amsterdam University Medical Centers, location AMC in Amsterdam, The Netherlands. To allow sharing of microbiome data collected in HELIUS with (inter)national researchers, 16s rRNA sequence analysis has been stored at the European genome-phenome archive (EGA; accession code EGAD00001004106). This requires that access needs to be granted, also because the HELIUS data are stored with relevant phenotypical variables. Access is granted to all researchers affiliated with an internationally recognized research institution who request to use the HELIUS data within the EGA context, after having signed the data transfer agreement. Any researcher can request the data by submitting a proposal to the HELIUS Executive Board as outlined at http://www.heliusstudy.nl/en/researchers/collaboration, by email: heliuscoordinator at amsterdamumc dot nl. The HELIUS Executive Board will check proposals if they do not conflict with ethical approvals and informed consent forms of the HELIUS study.
In the context of research, this dataset contains 423 IRD samples; 411 of them analyzed with Clinical Exome Sequencing solutions, and 12 with Whole Exome Sequencing.
Raw Illumina sequencing data and CellRanger BAM output files. For further information regarding this dataset, please contact Stephen Sansom at contact@combat.ox.ac.uk.
Targeted resequencing of samples was done with TruSeq custom amplicon low input kit (TSCA-LI, Illumina). The oligo capture probes were designed to include a prefix of 8 random nucleotides at the 5 end of each probe. The assay is designed such that each targeted locus is annealed with two probes, resulting in amplicons tagged with unique molecular identifiers (UMI) (22) of 16 bases. Raw FASTQ sequencing files were processed as following: (a) The first 8 bases were trimmed from each read and recorded with the corresponding base quality scores (BQ) in the attribute field. (b) Reads were aligned with BWA. (c) First round of PCR duplicate cleaning was performed with picard tools markDuplicates using the parameters BARCODE_TAG=BC TAGGING_POLICY=All REMOVE_DUPLICATES=true (d) Since in the previous step only duplicate reads with identical UMIs were removed, a second pass of filtering was done. Reads with identical mapping were considered unique only if their corresponding UMIs were different in at least 3 positions (i.e., UMI edit distance > 2). (e) Paired-end read pairs overlapping genomic positions were clipped to avoid overestimation of the sequencing coverage using bamUtils clipOverlap.
RNA-seq libraries were prepared using the KAPA Stranded RNA-Seq Kit with RiboErase (Kapa Biosystems, Wilmington, MA) and sequenced to a target depth of 200-M reads on the Illumina HiSeq platform (Illumina, San Diego, CA).
Tumor Total RNA Seq data of primary neuroblastomas. This is an update of the „Berlin Neuroblastoma Dataset” (EGAS00001004022). This data was used for the analysis of circular RNA expression and regulation in neuroblastoma.
Whole genome sequencing of sick children in neonatal and paediatric intensive care units, aligned to reference assembly GRCh37.
The dataset comprises of 5' single cell RNA sequencing with TCR enrichment with 10x Genomics' Chromium technology of multiregional biopsies of human renal cell carcinomas. Biopsies from different tumour regions, the tumour-normal interface, normal kidney, normal adrenal, metastatic regions, peri-nephric fat, and peripheral blood were sequenced from 12 patients with kidney tumours.
This study was designed to assess the phenotypic effects of rare variants. Rare variants are difficult to study in a high-throughput manner because most cohorts are underpowered to detect associations. In order to gain power to test rare variants, we use Phenotype Risk Scores (PheRS) based on features of Mendelian diseases. PheRS is calculated using claims data from an EHR that is mapped to clinical features from OMIM's clinical descriptions of Mendelian disease. We calculated PheRS for 1,204 Mendelians diseases in a cohort of 21,701 individuals genotyped on the HumanExome BeadChip. We then tested for association between these diseases and rare variants in causal genes. A phenotype risk score is calculated as the weighted sum of features for a given disease. The features are defined as a set of phecodes (consolidated ICD codes) associated with a particular Mendelian disease as described in OMIM. The weights for each phecode are calculated as the log inverse prevalence of the phecode in our cohort. For an individual, their score equals the sum of the weights for each phecode that is present in their medical record. Our study was based on a cohort of individuals with genotype data linked to de-identified electronic health records (EHR) from Vanderbilt's BioVU resource. We hope that this method will help generate phenotype-genotype correlation data on rare variants which in turn may inform rare variant interpretations use for diagnostics.