2021
DOI: 10.1101/2021.04.29.441979
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Interpreting Neural Networks for Biological Sequences by Learning Stochastic Masks

Abstract: Sequence-based neural networks can learn to make accurate predictions from large biological datasets, but model interpretation remains challenging. Many existing feature attribution methods are optimized for continuous rather than discrete input patterns and assess individual feature importance in isolation, making them ill-suited for interpreting non-linear interactions in molecular sequences. Building on work in computer vision and natural language processing, we developed an approach based on deep generativ… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 52 publications
(72 reference statements)
0
2
0
Order By: Relevance
“…Applying a more biologically meaningful data augmentation strategy may add more diversity into the training set. Conservation information is one of the most powerful features for predicting protein stability and functional effects [38]. In the study of 3Cnet, the artificial pathogenic-like variants were generated simply by considering the amino acid frequency and the number of gaps.…”
Section: Discussionmentioning
confidence: 99%
“…Applying a more biologically meaningful data augmentation strategy may add more diversity into the training set. Conservation information is one of the most powerful features for predicting protein stability and functional effects [38]. In the study of 3Cnet, the artificial pathogenic-like variants were generated simply by considering the amino acid frequency and the number of gaps.…”
Section: Discussionmentioning
confidence: 99%
“… Linder et al (2020) improved this technique to use it for various problems such as controlling the level of gene transcription, RNA splicing, or RNA 3’ cleavage. More recently, Linder et al (2021) used masks on the sequence to both determine whether each part of the input sequence was sufficient to explain the network predictions and use this information to generate new sequences with similar properties. Other applications include Cuperus et al (2017) who used their trained CNN to predict the translation level of mRNAs from their 5’ untranslated sequence.…”
Section: Survey Methodologymentioning
confidence: 99%