2019 IEEE International Conference on Image Processing (ICIP) 2019
DOI: 10.1109/icip.2019.8803153
|View full text |Cite
|
Sign up to set email alerts
|

Saliency Tubes: Visual Explanations for Spatio-Temporal Convolutions

Abstract: Deep learning approaches have been established as the main methodology for video classification and recognition. Recently, 3-dimensional convolutions have been used to achieve state-of-the-art performance in many challenging video datasets. Because of the high level of complexity of these methods, as the convolution operations are also extended to an additional dimension in order to extract features from it as well, providing a visualization for the signals that the network interpret as informative, is a chall… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
23
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

3
6

Authors

Journals

citations
Cited by 39 publications
(23 citation statements)
references
References 28 publications
(36 reference statements)
0
23
0
Order By: Relevance
“…At the same time egocentric videos usually offer a clear view of the camera Figure 1. Visualizing the class activation maps [45] for an instance of class 'open' from EPIC-Kitchens [7]. Left: Multi-Fiber Network (MFNet) [6] trained end-to-end for the single task of classifying short clips into actions.…”
Section: Introductionmentioning
confidence: 99%
“…At the same time egocentric videos usually offer a clear view of the camera Figure 1. Visualizing the class activation maps [45] for an instance of class 'open' from EPIC-Kitchens [7]. Left: Multi-Fiber Network (MFNet) [6] trained end-to-end for the single task of classifying short clips into actions.…”
Section: Introductionmentioning
confidence: 99%
“…In this paper, in order to enable a hierarchical representation of kernel layer activations in 3D-CNNs, we propose Class Feature Pyramids for discovering kernel correspondence on the basis of how informative kernel activations are considered for classes. The visualization of kernel activations are based on the Saliency Tubes approach [43] which uses the spine interpolate of spatio-temporal kernel activations in order to create a representation conjoined with the used clip.…”
Section: Related Workmentioning
confidence: 99%
“…A number of previous works [7,43] have focused on creating representations of the regions in space and time that 3D-CNNs focus on when considering a particular class instance. These regions are class-specific and correspond to the activation maps produced in the last convolutional operation of the network.…”
Section: Discovery Of Prominent Spatio-temporal Feature Combinationsmentioning
confidence: 99%
“…To the best of our knowledge, this is the first method that enables such a direct feature correspondence. We extend the proposed regularisation method to enable visualisation with the inclusion of Saliency Tubes [241] at each block that it is applied to. Based on this, we create representations of features with the highest activations with respect to the selected class.…”
Section: Class Regularisation For Visual Explanationsmentioning
confidence: 99%