Background Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. Methods It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. Results We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. Conclusions While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction.
Exomiser is an application that prioritizes genes and variants in next-generation sequencing (NGS) projects for novel disease-gene discovery or differential diagnostics of Mendelian disease. Exomiser comprises a suite of algorithms for prioritizing exome sequences using random-walk analysis of protein interaction networks, clinical relevance and cross-species phenotype comparisons, as well as a wide range of other computational filters for variant frequency, predicted pathogenicity and pedigree analysis. In this protocol, we provide a detailed explanation of how to install Exomiser and use it to prioritize exome sequences in a number of scenarios. Exomiser requires ~3 GB of RAM and roughly 15–90 s of computing time on a standard desktop computer to analyze a variant call format (VCF) file. Exomiser is freely available for academic use from http://www.sanger.ac.uk/science/tools/exomiser.
Summary Purpose: Epilepsies have a highly heterogeneous background with a strong genetic contribution. The variety of unspecific and overlapping syndromic and nonsyndromic phenotypes often hampers a clear clinical diagnosis and prevents straightforward genetic testing. Knowing the genetic basis of a patient’s epilepsy can be valuable not only for diagnosis but also for guiding treatment and estimating recurrence risks. Methods: To overcome these diagnostic restrictions, we composed a panel of genes for Next Generation Sequencing containing the most relevant epilepsy genes and covering the most relevant epilepsy phenotypes known so far. With this method, 265 genes were analyzed per patient in a single step. We evaluated this panel on a pilot cohort of 33 index patients with concise epilepsy phenotypes or with a severe but unspecific seizure disorder covering both sporadic and familial cases. Key Findings: We identified presumed disease‐causing mutations in 16 of 33 patients comprising sequence alterations in frequently as well as in less commonly affected genes. The detected aberrations encompassed known and unknown point mutations (SCN1A p.R222X, p. E289V, p.379R, p.R393H; SCN2A p.V208E; STXBP1 p.R122X; KCNJ10 p.L68P, p.I129V; KCTD7 p.L108M; KCNQ3 p.P574S; ARHGEF9 p.R290H; SMS p.F58L; TPP1 p.Q278R, p.Q422H; MFSD8 p.T294K), a putative splice site mutation (SCN1A c.693A> p.T/P231P) and small deletions (SCN1A p.F1330Lfs3X [1 bp]; MFSD8 p.A138Dfs10X [7 bp]). All mutations have been confirmed by conventional Sanger sequencing and, where possible, validated by parental testing and segregation analysis. In three patients with either Dravet syndrome or myoclonic epilepsy, we detected SCN1A mutations (p.R222X, p.P231P, p.R393H), even though other laboratories had previously excluded aberrations of this gene by Sanger sequencing or high‐resolution melting analysis. Significance: We have developed a fast and cost‐efficient diagnostic screening method to analyze the genetic basis of epilepsies. We were able to detect mutations in patients with clear and with unspecific epilepsy phenotypes, to uncover the genetic basis of many so far unresolved cases with epilepsy including mutation detection in cases in which previous conventional methods yielded falsely negative results. Our approach thus proved to be a powerful diagnostic tool that may contribute to collecting information on both common and unknown epileptic disorders and in delineating associated phenotypes of less frequently mutated genes.
The interpretation of non-coding variants still constitutes a major challenge in the application of whole-genome sequencing in Mendelian disease, especially for single-nucleotide and other small non-coding variants. Here we present Genomiser, an analysis framework that is able not only to score the relevance of variation in the non-coding genome, but also to associate regulatory variants to specific Mendelian diseases. Genomiser scores variants through either existing methods such as CADD or a bespoke machine learning method and combines these with allele frequency, regulatory sequences, chromosomal topological domains, and phenotypic relevance to discover variants associated to specific Mendelian disorders. Overall, Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes, allowing effective detection and discovery of regulatory variants in Mendelian disease.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.