2021
DOI: 10.1101/2021.01.28.428499
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Interpretable prioritization of splice variants in diagnostic next-generation sequencing

Abstract: A critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 5′ and 3′ ends of introns. To address this gap, we developed the Super Quick Informationcontent Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content (IC) of wi… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 9 publications
(10 citation statements)
references
References 58 publications
0
10
0
Order By: Relevance
“…Our findings demonstrate the opportunity to expand bioinformatics analysis to the pre-mRNA regions of known disease genes and provide immediate increases to diagnostic yield. Further, a wide variety of bioinformatics prediction tools continue to be developed, as seen with the recent release of CADD-Splice 34 , and SQUIRLS 35 . As such tools continue to become available, careful analysis of their utility using a framework as described here will allow integration with maximum effect.…”
Section: Discussionmentioning
confidence: 99%
“…Our findings demonstrate the opportunity to expand bioinformatics analysis to the pre-mRNA regions of known disease genes and provide immediate increases to diagnostic yield. Further, a wide variety of bioinformatics prediction tools continue to be developed, as seen with the recent release of CADD-Splice 34 , and SQUIRLS 35 . As such tools continue to become available, careful analysis of their utility using a framework as described here will allow integration with maximum effect.…”
Section: Discussionmentioning
confidence: 99%
“…ConSpliceML uses a Random Forest (RF) classifier to combine regional splicing constraint measures with per-nucleotide alternative splicing predictions from SpliceAI and SQUIRLS ( Methods ). We chose to use both SpliceAI and SQUIRLS since they each perform well at predicting alternative splicing, 77 yet also capture variants missed by the other.…”
Section: Resultsmentioning
confidence: 99%
“…Similarly, SQUIRLS uses a Random Forest approach to predict whether a variant causes alternative splicing with similar performance as SpliceAI. 77 SQUIRLS differs in that it uses various features such as conservation scores from phyloP 78 , sequence context parameters such as local distance from an exon and canonical splice sites, models of changes in spliceosome free energy binding, and other prediction tools such as ESRSeq. 79 However, these and other existing approaches provide little to no guidance as to whether or not the variant is deleterious.…”
Section: Introductionmentioning
confidence: 99%
“…Although details vary from tool to tool, in general, variant pathogenicity is assessed on the basis of variant allele population frequencies, evolutionary conservation, and functional impact prediction for missense, splice, and regulatory variants. [29][30][31] Disease genes can be prioritized based on functional and genomic data, 32 or on the basis of phenotypic similarity of patient phenotype definitions with computational disease models of the HPO project. 16,33 There is a need to extend these algorithms for LRS.…”
Section: Discussionmentioning
confidence: 99%