2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.01019
|View full text |Cite
|
Sign up to set email alerts
|

LSTA: Long Short-Term Attention for Egocentric Action Recognition

Abstract: Egocentric activity recognition is one of the most challenging tasks in video analysis. It requires a fine-grained discrimination of small objects and their manipulation. While some methods base on strong supervision and attention mechanisms, they are either annotation consuming or do not take spatio-temporal patterns into account. In this paper we propose LSTA as a mechanism to focus on features from relevant spatial parts while attention is being tracked smoothly across the video sequence. We demonstrate the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
128
1
1

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 150 publications
(132 citation statements)
references
References 40 publications
2
128
1
1
Order By: Relevance
“…These results indicate that directly training over the action performs better than the combination of verb+noun. Our multimodal models obtained better scores than the challenge baseline and has similar results as previous works [50]. Additionally, the results obtained on the unseen participants (S2) test split are in the top-ten ranking of the first challenge.…”
Section: Resultssupporting
confidence: 81%
“…These results indicate that directly training over the action performs better than the combination of verb+noun. Our multimodal models obtained better scores than the challenge baseline and has similar results as previous works [50]. Additionally, the results obtained on the unseen participants (S2) test split are in the top-ten ranking of the first challenge.…”
Section: Resultssupporting
confidence: 81%
“…In [57], the relationship between tasks is modelled in a latent space to transfer knowledge between them and reduce the number of required training samples. MTL in egocentric vision appears in [1,28,25,18,29,47].…”
Section: Multitask Learningmentioning
confidence: 99%
“…Recurrent with attention The temporal aspect of videos is further studied with recurrent attention mechanisms [3,39,48,47,25,11,56] that act to find the most informative parts in images (spatial attention) or the most informative frames throughout videos (temporal attention). An encoderdecoder scheme is described in [3] for textual description of videos.…”
Section: Advances In First-person Activity Recognitionmentioning
confidence: 99%
See 1 more Smart Citation
“…Action Recognition Our work is related to previous research on action recognition from third person vision [7,13,14,30,34,35,49,57,59,60,61,67] and first person vision [11,12,36,37,38,44,48,52,54,55,56]. Specifically, we build on previous ideas investigated in the context of action recognition such as the use of multiple modali-ties for video analysis [49], the use of Temporal Segment Networks [61] as a principled way to train CNNs for action recognition, as well as the explicit encoding of object-based features [11,38,44,51,56] to analyze egocentric video.…”
Section: Related Workmentioning
confidence: 99%