2020
DOI: 10.48550/arxiv.2001.11122
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences

Abstract: Understanding the structure of complex activities in videos is one of the many challenges faced by action recognition methods. To overcome this challenge, not only do methods need a solid knowledge of the visual structure of underlying features but also a good interpretation of how they could change over time. Consequently, action segmentation tasks must take into account not only the visual cues from individual frames, but their characteristics as a temporal sequence of features.This work presents our finding… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 21 publications
0
5
0
Order By: Relevance
“…To address this problem, some previous works included frame-wise feature vectors into a temporal embedding. For instance, VidalMata et al [40] trained a temporal embedding model as a Multilayer Perceptron with the learning goal of predicting the relative timestamp 𝑡 of a given frame.…”
Section: Positional Encodingmentioning
confidence: 99%
See 1 more Smart Citation
“…To address this problem, some previous works included frame-wise feature vectors into a temporal embedding. For instance, VidalMata et al [40] trained a temporal embedding model as a Multilayer Perceptron with the learning goal of predicting the relative timestamp 𝑡 of a given frame.…”
Section: Positional Encodingmentioning
confidence: 99%
“…Nevertheless, these solutions require frame-level or scene-level annotations that are incredibly laborious. For this reason, researchers started focusing on methods with less supervision, such as weakly-supervised [4,22,29,37] and unsupervised methods [23,33,40].…”
Section: Introductionmentioning
confidence: 99%
“…For this reason, researchers started focusing on methods with less supervision, such as weakly-supervised (16,17,18,19) and unsupervised methods (20,21,22). Most of these methods (17,18,19,23,24) are based on the idea of generating pseudo-labels that are used to train supervised models.…”
Section: List Of Tablesmentioning
confidence: 99%
“…Finally by applying a frame-wise decoding with the Viterbi Algorithm they temporally segment each video by maximizing the probability of a sequence of frames that follows the cluster ordering, this way helping to ensure consistency among the labels for each frame. VidalMatal et al (21) presented an unsupervised approach that segments actions based on visual-temporal embeddings. To achieve this, they do a two stage training method.…”
Section: Temporal Action Segmentationmentioning
confidence: 99%
See 1 more Smart Citation