MMSplice: modular modeling improves the predictions of genetic variant effects on splicing

Cheng, Jianghua; Nguyen, Thi Yen Duong; Cygan, Kamil J.; Çelik, Muhammed Hasan; Fairbrother, William G.; Avsec, Žiga

doi:10.1186/s13059-019-1653-z

Cited by 186 publications

(226 citation statements)

References 57 publications

Supporting

Mentioning

216

Contrasting

Order By: Relevance

“…The group 3 (ranked 5 th) did not provide implementation details. On the other side, the group 5 made the best predictions by using their developed MMSplice method (Cheng et al, ). In their method, six deep neural networks have been trained to extract features of splice donor, splice acceptor, 5′ exon, 3′ exon, 5′intron, and 3′ intron, which were later combined by a simple linear regression to predict

Δ Ψ

.…”

Section: Discussionmentioning

confidence: 99%

Predicting the change of exon splicing caused by genetic variant using support vector regression

Chen

Zhao

et al. 2019

Human Mutation

View full text Add to dashboard Cite

Alternative splicing can be disrupted by genetic variants that are related to diseases like cancers. Discovering the influence of genetic variations on the alternative splicing will improve the understanding of the pathogenesis of variants. Here, we developed a new approach, PredPSI‐SVR to predict the impact of variants on exon skipping events by using the support vector regression. From the sequence of a particular exon and its flanking regions, 42 comprehensive features related to splicing events were extracted. By using a greedy feature selection algorithm, we found eight features contributing most to the prediction. The trained model achieved a Pearson correlation coefficient (PCC) of 0.570 in the 10‐fold cross‐validation based on the training data set provided by the “vex‐seq” challenge of the 5th Critical Assessment of Genome Interpretation. In the blind test also held by the challenge, our prediction ranked the 2nd with a PCC of 0.566 that demonstrates the robustness of our method. A further test indicated that the PredPSI‐SVR is helpful in prioritizing deleterious synonymous mutations. The method is available on https://github.com/chenkenbio/PredPSI-SVR.

show abstract

Δ Ψ

.…”

Section: Discussionmentioning

confidence: 99%

Predicting the change of exon splicing caused by genetic variant using support vector regression

Chen

Zhao

et al. 2019

Human Mutation

View full text Add to dashboard Cite

show abstract

“…It is probably difficult to train a model capturing much of the splicing regulatory elements directly from these data. Therefore, we used complementary data from different sources that are richer (Cheng et al, ). We used the GENCODE 24 annotation to train a module to score donor sites and similarly a module to score acceptor sites.…”

Section: Methodsmentioning

confidence: 99%

“…Although the two challenges have different measured quantities, we assumed that variant disrupting splicing could affect both Ψ and splicing efficiency. Therefore, we applied a modular modeling approach, MMSplice (Cheng et al, ), where the modules score different gene regions and are shared across challenges. The predictors proposed for each challenge differed only in how they combine the scores of the individual modules.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

CAGI 5 splicing challenge: Improved exon skipping and intron retention predictions with MMSplice

et al. 2019

Self Cite

View full text Add to dashboard Cite

Pathogenic genetic variants often primarily affect splicing. However, it remains difficult to quantitatively predict whether and how genetic variants affect splicing. In 2018, the fifth edition of the Critical Assessment of Genome Interpretation proposed two splicing prediction challenges based on experimental perturbation assays: Vex‐seq, assessing exon skipping, and MaPSy, assessing splicing efficiency. We developed a modular modeling framework, MMSplice, the performance of which was among the best on both challenges. Here we provide insights into the modeling assumptions of MMSplice and its individual modules. We furthermore illustrate how MMSplice can be applied in practice for individual genome interpretation, using the MMSplice VEP plugin and the Kipoi variant interpretation plugin, which are directly applicable to VCF files.

show abstract

“…While other aspects of their model could account for this superior performance, one unique feature of their model (MMsplice;Cheng et al, 2019) is the decomposition of sequence surrounding alternatively spliced exons into five distinct regions (upstream intron, acceptor site, exon, donor site, and downstream intron), each of which was evaluated by a distinct neural network. While other aspects of their model could account for this superior performance, one unique feature of their model (MMsplice;Cheng et al, 2019) is the decomposition of sequence surrounding alternatively spliced exons into five distinct regions (upstream intron, acceptor site, exon, donor site, and downstream intron), each of which was evaluated by a distinct neural network.…”

mentioning

confidence: 99%

Assessing predictions of the impact of variants on splicing in CAGI5

et al. 2019

Self Cite

View full text Add to dashboard Cite

Precision medicine and sequence‐based clinical diagnostics seek to predict disease risk or to identify causative variants from sequencing data. The Critical Assessment of Genome Interpretation (CAGI) is a community experiment consisting of genotype‐phenotype prediction challenges; participants build models, undergo assessment, and share key findings. In the past, few CAGI challenges have addressed the impact of sequence variants on splicing. In CAGI5, two challenges (Vex‐seq and MaPSY) involved prediction of the effect of variants, primarily single‐nucleotide changes, on splicing. Although there are significant differences between these two challenges, both involved prediction of results from high‐throughput exon inclusion assays. Here, we discuss the methods used to predict the impact of these variants on splicing, their performance, strengths, and weaknesses, and prospects for predicting the impact of sequence variation on splicing and disease phenotypes.

show abstract

MMSplice: modular modeling improves the predictions of genetic variant effects on splicing

Cited by 186 publications

References 57 publications

Predicting the change of exon splicing caused by genetic variant using support vector regression

Predicting the change of exon splicing caused by genetic variant using support vector regression

CAGI 5 splicing challenge: Improved exon skipping and intron retention predictions with MMSplice

Assessing predictions of the impact of variants on splicing in CAGI5

Contact Info

Product

Resources

About