Computational prediction of human deep intronic variation

Barbosa, Pedro; Savisaar, Rosina; Carmo‐Fonseca, Maria; Fonseca, Alcides

doi:10.1101/2023.02.17.528928

Cited by 1 publication

(1 citation statement)

References 105 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The datasets, supplementary material, and steps to reproduce all the results of this article are available in GitHub [ 113 ]. Supporting data, including variant sets, figures, and tables, are also available via the GigaScience repository, GigaDB [ 114 ].…”

Section: Data Availabilitymentioning

confidence: 99%

Computational prediction of human deep intronic variation

Barbosa,

Savisaar,

Carmo-Fonseca

et al. 2022

GigaScience

View full text Add to dashboard Cite

Background The adoption of whole-genome sequencing in genetic screens has facilitated the detection of genetic variation in the intronic regions of genes, far from annotated splice sites. However, selecting an appropriate computational tool to discriminate functionally relevant genetic variants from those with no effect is challenging, particularly for deep intronic regions where independent benchmarks are scarce. Results In this study, we have provided an overview of the computational methods available and the extent to which they can be used to analyze deep intronic variation. We leveraged diverse datasets to extensively evaluate tool performance across different intronic regions, distinguishing between variants that are expected to disrupt splicing through different molecular mechanisms. Notably, we compared the performance of SpliceAI, a widely used sequence-based deep learning model, with that of more recent methods that extend its original implementation. We observed considerable differences in tool performance depending on the region considered, with variants generating cryptic splice sites being better predicted than those that potentially affect splicing regulatory elements. Finally, we devised a novel quantitative assessment of tool interpretability and found that tools providing mechanistic explanations of their predictions are often correct with respect to the ground - information, but the use of these tools results in decreased predictive power when compared to black box methods. Conclusions Our findings translate into practical recommendations for tool usage and provide a reference framework for applying prediction tools in deep intronic regions, enabling more informed decision-making by practitioners.

show abstract

Section: Data Availabilitymentioning

confidence: 99%