2019
DOI: 10.1101/674119
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Critiquing Protein Family Classification Models Using Sufficient Input Subsets

Abstract: In many application domains, neural networks are highly accurate and have been deployed at large scale. However, users often do not have good tools for understanding how these models arrive at their predictions. This has hindered adoption in fields such as the life and medical sciences, where researchers require that models base their decisions on underlying biological phenomena rather than peculiarities of the dataset introduced. In response, we propose a set of methods for critiquing deep learning models and… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3

Relationship

2
5

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 44 publications
0
8
0
Order By: Relevance
“…This differs substantially from approaches such as BLASTp, phmmer and HMMER that perform annotation using explicit alignment. We note that simpler models provide useful attribution of model decision making, and we anticipate that similar insights will emerge from work that improves the interpretation and understanding of deep learning models [41][42][43].…”
Section: Discussionmentioning
confidence: 88%
“…This differs substantially from approaches such as BLASTp, phmmer and HMMER that perform annotation using explicit alignment. We note that simpler models provide useful attribution of model decision making, and we anticipate that similar insights will emerge from work that improves the interpretation and understanding of deep learning models [41][42][43].…”
Section: Discussionmentioning
confidence: 88%
“…Neural models are fast to evaluate with a single forward pass. However, they can exhibit pathological behavior when used as optimization objectives, giving high scores to unrealistic sequence [49, 50] or giving outsize influence to irrelevant parts of the sequence [51]. While trained neural models can exhibit high levels of ruggedness [52], it is not straightforward to tune the optimization difficulty of a neural landscape.…”
Section: Background and Related Workmentioning
confidence: 99%
“…We applied Sufficient Input Subset (SIS) analysis (Carter et al , 2018) to interpret the sequence features the Embedding-Only model has learned to identify MHC ligands. On 10 000 random samples of all MHC ligands of 9 amino acids in our dataset, we performed SIS to locate the minimal subset of residuals for the Embedding-Only model to predict a peptide as MHC ligand with a probability >95% (Section 2).…”
Section: Resultsmentioning
confidence: 99%