2021
DOI: 10.1007/978-3-030-69541-5_29
|View full text |Cite
|
Sign up to set email alerts
|

Play Fair: Frame Attributions in Video Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 37 publications
0
4
0
Order By: Relevance
“…Huang [23] study the effect of motion in action recognition by checking the accuracy drop from frames without using motion. Price et al [37] explores frame-level contribution to the model output with Element Shapley Value. It should be noted that most of prior works try to find salient spatio-temporal regions for action recognition while our work quantifies how a model learns temporal relevances and investigate inter-frame relationship based on its architecture, as opposed to why a model makes a specific prediction.…”
Section: Related Workmentioning
confidence: 99%
“…Huang [23] study the effect of motion in action recognition by checking the accuracy drop from frames without using motion. Price et al [37] explores frame-level contribution to the model output with Element Shapley Value. It should be noted that most of prior works try to find salient spatio-temporal regions for action recognition while our work quantifies how a model learns temporal relevances and investigate inter-frame relationship based on its architecture, as opposed to why a model makes a specific prediction.…”
Section: Related Workmentioning
confidence: 99%
“…Qualitative differences are found between what the two model types tend to use as evidence for classification decisions. Another work on explainability for video models is by Price et al [38], but only one type of model, and its decisions, is studied (TRN [54]). We are connected to the work of Sevilla-Lara et al [40], who discuss the risk that models with strong image modeling abilities may prioritize those cues over the temporal modeling cues.…”
Section: Related Workmentioning
confidence: 99%
“…Bargal et al [2] propose an explanation technique for recurrent neural networks (RNNs) with convolutional layers utilizing excitation backpropagation [27]. Perturbationbased black-box approaches have also been investigated to explain a video classifier by presenting salient frames [18] or a 3D generalization of a saliency map [14]. Same as the explanation of image classifiers, our technique allows combining the techniques above and the Bayesian optimization to balance various tradeoffs.…”
Section: Introductionmentioning
confidence: 99%