GeneChip HTA 2.0 data of primary renal cell carcinoma (RCC) related to Reustle et al, Genome Med 12:2020 32. Preprocessing of microarray data was performed using Robust Multi-array Average (RMA).
6 control exomes, TB cohort, sequenced on Illumina machines from the paper "Germline Elongator mutations in Sonic Hedgehog medulloblastoma" (Waszak et al. 2020 Nature).
McGill EMC Release 4 for assay "ATAC-seq": Sequencing of transposase-accessible chromatin as described by Buenrostro et al. (Nature Methods 10, 1213?1218 (2013) doi:10.1038/nmeth.2688)
ChIP-seq data (H3K4Me3, H3K27Ac histone modifications) of Hodgkin lymphoma cell line L-428. Samples were processed as previously described (Sud et al., 2018). The files are in bam format, aligned to build 37 of the human genome.
54 WGS Ewing's sarcoma samples sequenced at The Hospital for Sick Children Toronto (Adam Shlien's lab) and published on Science 2018. Reference Anderson et al. "Rearrangement bursts generate canonical gene fusions in bone and soft tissue tumors"
RNA-seq fastq files for the 16 samples used in Michealraj et al. Cell 2020. The samples include PFA and ST ependymoma tissues, normal pediatric brain as control and PFA ependymoma lines.
Neuroblastoma, a clinically heterogeneous pediatric cancer, is characterized by distinct genomic profiles but few recurrent mutations. As neuroblastoma is expected to have high degree of genetic heterogeneity, study of neuroblastoma's clonal evolution with deep coverage whole-genome sequencing of diagnosis and relapse samples will lead to a better understanding of the molecular events associated with relapse. Samples were included in this study if sufficient DNA from constitutional, diagnosis and relapse tumors was available for WGS. Whole genome sequencing was performed on trios (constitutional, diagnose and relapse DNA) from eight patients using Illumina Hi-seq2500 leading to paired-ends (PE) 90x90 for 6 of them and 100x100 for two. Expected coverage for sample NB0175 100x100bp was 30X for tumor and constitutional samples. For the seven other patients expected coverage was 80X for tumor samples with PE 100x100, 100X in the other tumor samples and 50X for all constitutional samples (see table 1). Following alignment with BWA (Li et al., Oxford J, 2009 Jul) allowing up to 4% of mismatches, bam files were cleaned up according to the Genome Analysis Toolkit (GATK) recommendations (Van der Auwera et al., Current Protocols in Bioinformatics, 2013, picard-1.45, GenomeAnalysisTK-2.2-16). Variant calling was performed in parallel using 3 variant callers: GenomeAnalysisTK-2.2-16, Samtools-0.1.18 and MuTect-1.1.4 (McKenna et al., Genome Res, 2010; Li et al., Oxford J, 2009 Aug; Cibulskis et al., Nature, 2013). Annovar-v2012-10-23 with cosmic-v64 and dbsnp-v137 were used for the annotation and RefSeq for the structural annotation. For GATK and Samtools, single nucleotide variants (SNVs) with a quality under 30, a depth of coverage under 6 or with less than 2 reads supporting the variant were filter out. MuTect with parameters following GATK and Samtools thresholds have been used to filter our irrelevant variants. .SNVs within and around exons of coding genes overlapping splice sites.. Then,variants reported in more than 1% of the population in the 1000 genomes (1000gAprl_2012) or Exome Sequencing Project (ESP6500) have been discarded in order to filter polymorphisms. Finally, synonymous variants were filtered out. MuTect focuses on somatic by filtering with constitutional sample. Mpileup comparison between constitutional and somatic DNAs allowed us to focus also on tumor specific SNVs with GATK and Samtools. Finally, every SNV called by our pipeline and also supported in any constitutional samples were filtered our in order to prevent putative constitutional DNA coverage deficiency. Then we analyzed CNVs (copy number variants) with HMMcopy-v0.1.1 (Gavin et al., Genome Res, 2012) and control-FREEC-v6.7 (Boeva et al., Bioinformatics 2011) with a respective window of 2000bp and 1000 bp, and auto-correction of normal contamination of tumor samples for Control-FREEC. Finally we explored Structural variants (SVs) including deletions, inversions, tandem duplications and translocations using DELLY-v0.5.5 with standard parameters (Rausch et al., Oxford J, 2012). In tumors, at least 10 supporting reads were required to make a call and 5 supporting reads for the sample NB0175 with a coverage of only 40X (see table 2). To predict SVs in constitutional samples for subsequent somatic filtering, only 2 supporting reads were required in order not to miss one. To identify somatic events, all the SVs in each normal sample were first flanked by 500 bp in both directions and any SVs called in a tumor sample which was in the combined flanked regions of respective normal sample was removed (see graph 1). Deletions with more than 5 genes impacted or larger than 1Mb and inversions or tandem duplications covering more than 4 genes, were removed. We focused on exonic and splicing events for deletions, inversions, and tandem duplications. For translocation, we keep all SVs that occurred in intronic, exonic, 5'UTR, upstream or splicing regions. Bioinformatics detection of variations with Deep sequencing approach Once PE reads merged and adaptors trimmed by SeqPrep with default parameters, merged reads were aligned via the BWA (Li H. and Durbin R. 2009 PMID 19451168) allowing up to 1 differences in the 22-base-long seeds and reporting only unique alignments. Only reads having a mapping quality 20 or more have been further analysed. Variant calling software was not used, since we aimed to predict variations at low frequencies, observed in less than 1% of reads. Such variants require a custom approach. Using DepthOfCoverage functions of the Genome Analysis Toolkit (GATK) v2.13.2 (McKenna A, et al., 2010 Genome Research PMID: 20644199), we focused on high quality coverage of bases A, C, G and T at the targeted variant position. Depth of coverage of each base following a mapping quality higher than 20 and a base quality higher than 10 have been taken into account in order to focus only on high quality data. Aiming to determine the background level of variability at the studied regions, 10 control samples were included in the analysis. The same approach and filtering criteria have been applied as introduced above over the entire amplicons. In order to highlight variants, for each sample the frequencies of each bases at each amplicon position were then compared to those observed in the set of controls. Statistical analyses were performed with the R statistical software (http://www.R-project.org). Fisher’s exact two-sided tests with a Bonferroni correction were performed to compare percentages of bases between the data sets, i.e. for a given base between a case and the controls. Finally, significant variations were filtered-in once (i) a significant increase in the percentage of avariant base and (ii) a significant decrease in the percentage of it's reference base following our p.values criteria was observed (p.val < 0.05).
Study 1 2R01-NS050375 (PI: DOBYNS, William B.) The genetic basis of mid-hindbrain malformations Our general goal for this project is to advance our understanding of human developmental disorders that involve the brainstem and cerebellum - brain structures derived from the embryonic midbrain and hindbrain - that affect a minimum of 2.4 per 1000 resident births based on data from the CDC. Importantly, this large class of disorders co-occurs with more common developmental disorders such as autism, mental retardation and some forms of infantile epilepsy, and shares some of the same causes. With this renewal, we propose to expand the scope of our work beyond single phenotypes and genes to focus on delineating the critical phenotype spectra to which the most common MHM belong, and defining the underlying biological networks that are disrupted. To pursue these goals, we will use our large and growing cohort of human subjects to map additional MHM loci using SNP microarrays that provide both high-resolution autozygosity and linkage data in informative families as well as detect critical copy number variants in sporadic subjects. The causative genes will be identified using traditional Sanger or new high-throughput sequencing methods as appropriate abased on size of the critical region. We will use these and other known MHM causative genes to construct and revise model biological networks of genes and proteins, and test these genes and networks in additional patients as a candidate gene or more accurately a candidate network approach. These approaches need to be supported by ongoing active subject recruitment, as studies of comparable disorders such as mental retardation and autism have benefited from even larger numbers of subjects that we have so far collected. We need to use new high-throughput sequencing methods to more efficiently test larger critical regions, and to test entire gene networks rather than individual genes in matched cohorts of subjects. At every step; phenotype analysis, CNV analysis, model network construction and high-throughput sequencing, we will need expanded bioinformatics capabilities. Finally, we need to test the biological function of new genes and networks to support our gene identification studies. We expect that these studies will contribute immediately to more accurate diagnosis and counseling, and over time will lead to development of specific treatments for a subset of these disorders. We further expect that studies of mid-hindbrain development will have broad significance for human developmental disorders generally, providing compelling evidence for a connection between cerebellar development and other classes of developmental disorders such as autism, mental retardation and epilepsy. Study 2 R01-NS058721 (PI: DOBYNS, William B.) De novo copy number variation and gene discovery in human brain malformations Project Summary/Abstract The number of recognized brain malformations and syndromes has grown rapidly during the past several decades, yet relatively few causative genes have been identified, especially for three common malformations that have been associated with numerous cytogenetically visible chromosome deletions and duplications, and that often occur together: agenesis of the corpus callosum (ACC), cerebellar vermis hypoplasia (CVH) including Dandy-Walker malformation (DWM), and polymicrogyria (PMG). We propose to perform high-resolution array comparative genome hybridization (aCGH), emerging technology able to detect small copy number variants (CNV), in 700 probands with one or more of these three malformations. Our central hypothesis states that more than 10% of patients with ACC, CVH or PMG will have de novo CNV below the resolution of routine cytogenetic analysis, but detectable by current array platforms. We therefore expect to identify 70-100 patients with small CNV. We will distinguish CNV found in normal individuals from potentially disease-associated changes, and will confirm CNV using fluorescence in situ hybridization (FISH) and microsatellite (STRP) analysis. We will give highest priority to CNV that are de novo and involve 2 or more BACs, and secondary priority to familial and smaller CNV excluding known polymorphisms. After that, we will evaluate and rank candidate genes in the critical regions using information from public databases and our own expression studies, and perform mutation analysis of the best candidate genes from well-defined critical regions by sequencing in a large panel of subjects with phenotypes that match the phenotypes of the patients whose CNV define the critical regions. Here, we will use more refined criteria to supplement our clinical classification, such as the developmental level and presence of epilepsy or other birth defects. Any abnormalities found will be analyzed using existing data regarding polymorphisms (i.e. dbSNP), cross-species comparisons, and functional assays appropriate for the specific sequence change. Study 2A In 1995, we described a novel multiple congenital anomaly syndrome associated with facial dysmorphism (congenital ptosis, high arched eyebrows, shallow orbits, trigonocephaly), colobomas of the eyes, neuronal migration malformation (frontal predominant lissencephaly) and variable hearing loss. We hypothesized from de novo mutations and used trio-based exome sequencing to identify de novo mutations in the ACTB and ACTG1 genes. Study 2B In 1997 and 2004, we and others defined two novel developmental syndromes associated with markedly enlarged brain size, or megalencephaly, and other highly recognizable features. The megalencephaly-capillary malformation syndrome (MCAP) consists of megalencephaly and associated growth dysregulation with variable asymmetry, developmental vascular anomalies, distal limb malformations, variable cortical malformation, and a mild connective tissue dysplasia. The megalencephaly-polymicrogyria-polydactyly-hydrocephalus syndrome (MPPH) resembles MCAP but lacks vascular malformations and syndactyly. We hypothesized that MCAP and MPPH result from mutations - including postzygotic events - in the same pathway, and studied them together. Using a combination of exome sequencing, Sanger sequencing, restriction-enzyme assays, and targeted ultra-deep sequencing in 50 families with MCAP or MPPH, we identified de novo germline or postzygotic mutations in three core components of the phosphatidylinositol-3-kinase/AKT pathway. These include two mutations in AKT3, a recurrent mutation in PIK3R2, and multiple mostly postzygotic mutations in PIK3CA (Rivière JB, Mirzaa GM, O'Roak BJ, Beddaoui M, Alcantara D, Conway RL, St-Onge J, Schwartzentruber JA, Gripp KW, Nikkel SM, Worthylake T, Sullivan CT, Ward TR, Butler HE, Kramer NA, Albrecht B, Armour CM, Armstrong L, Caluseriu O, Cytrynbaum C, Drolet BA, Innes AM, Lauzon JL, Lin AE, Mancini GMS, Meschino WS, Reggin JD, Saggar AK, Lerman-Sagie T, Uyanik G, Weksberg R, Zirn B, Beaulieu CL, FORGE Canada Consortium, Majewski J, Bulman DE, O'Driscoll M, Shendure J, Graham Jr. JM, Boycott KM, Dobyns WB. De novo germline and postzygotic mutations in AKT3, PIK3R2 and PIK3CA cause a spectrum of related megalencephaly syndromes. Nat. Genet. In press). Study 3 2R01-NS046616 (PI: GOLDEN, Jeffrey A) The role of ARX in normal and abnormal brain development This subcontract from the Children's Hospital of Philadelphia to the University of Chicago (UC) is intended to support research studies of the ARX and functionally related genes in human subjects with any one of several specific developmental disorders. The Co-investigator at UC (W.B. Dobyns) will identify a series of patients with mental retardation and severe infantile epilepsy, some of whom will have specific brain malformations and others who will have normal brain structure by brain imaging studies, and collect research samples from these subjects with informed consent. The studies to be performed will include mutation analysis of ARX, mutation analysis of specific downstream target genes, X inactivation studies in humans and X inactivation studies in mutant mice. The results will be analyzed to determine the significance of any changes found in the gene.
Atrial fibrillation (also called AFib or AF) is a quivering or irregular heartbeat (arrhythmia) that can lead to blood clots, stroke, heart failure and other heart-related complications. At least 2.7 million Americans are living with AFib. Individuals with early onset atrial fibrillation (AF) are included in this study of cases from the BioVU sample repository. BioVU is Vanderbilt's biobank of DNA extracted from leftover and otherwise discarded clinical blood specimens. BioVU operates as a consented biorepository; all individuals must sign the BioVU consent form in order to donate future specimens. BioVU subjects are de-identified and linked to the Synthetic Derivative enabling researchers to access genetic data/DNA material as well as dense, longitudinal electronic medical record (EMR) information.
The Team Mallory Freeberg Roderic Guigó Arcadi Navarro Helen Parkinson Jordi Rambla Ana T. Alonso Silvia Bahena Àlex Bedmar Kenneth Buckley Aldar Cabrelles Ángel Carreño Marcos Casado Giulia Cellerino Amy Curwin Teresa D'Altri Abeer Fadda Teresa Garcia Sara Gregorio Max Fischer Bela Juhasz Oriol Lopez-Doriga Mireia Marin Óscar Martínez Andrea Mero Akiris Moctezuma Aurora Moreno Liina Nagirnaja Francesc de Puig Santiago Rensonnet Gabriele Rinck Aravind Sankar Andres Silva Coline Thomas Sabela de la Torre Gemma Vicente The EGA Team at the CRG co-manages the European Genome and phenome Archive together with the EGA Team at the European Bioinformatics Institute. In addition to maintaining and distributing data, we enrich the contents of the EGA contributing with our knowledge about genomics and the relationship between genomes and phenomes. Previous Team Members Alexander Vikhorev Jeff Almeida-King Mario Alberich Sergi Aguilo Pablo Arce Minjie Ding Alfred Gil Leslie Glass Jag Kandasamy Vasudev Kumanduri llkka Lapalainen Audald Lloret i Villas Sira Martinez Anand Mohan Dietmar Orth Justin Paschall Saif Ur Rehman Gary Saunders Thomas Smith Ashutosh Shimpi Marc Sitges Dhvani Solanki Giselle Kerry Nino Spataro Dylan Spalding Matthieu Vizuete-Forster Cristina Yenyxe Gonzalez Garcia Paul Flicek Anna Foix Emilio Garcia Rios Jorge Izquierdo Roberto Ariosa Marta Ferri Peradalta Daniel Barrowdale Babita Singh Umuthan Uyan Aleix Canalda Dona Shaju Mauricio Moldes Carles Garcia Frédéric Haziza Alegria Aclan Lauren Fromont Alvis Brazma Marta Huertas Arnau Soler Gemma Milla Claudia Vasallo Aina Jené Csaba Halmagyi Raül Garcia Thomas Keane Mei Gascón