2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01092
|View full text |Cite
|
Sign up to set email alerts
|

Representation Learning via Global Temporal Alignment and Cycle-Consistency

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
28
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 47 publications
(29 citation statements)
references
References 34 publications
0
28
1
Order By: Relevance
“…• Our framework outperforms the state-of-the-art methods by a large margin on multiple tasks across different datasets. For example, under the linear evaluation protocol on FineGym [37] dataset, our framework achieves 41.75% accuracy, which is +13.94% higher than the existing best method GTA [21]. On Penn-Action [50] dataset, our method achieves 91.67% for fine-grained classification, 99.1% for Kendall's Tau, and 90.58% top-5 accuracy for fine-grained frame retrieval, which all surpass the existing best methods.…”
Section: Liquid Starts Exitingmentioning
confidence: 82%
See 3 more Smart Citations
“…• Our framework outperforms the state-of-the-art methods by a large margin on multiple tasks across different datasets. For example, under the linear evaluation protocol on FineGym [37] dataset, our framework achieves 41.75% accuracy, which is +13.94% higher than the existing best method GTA [21]. On Penn-Action [50] dataset, our method achieves 91.67% for fine-grained classification, 99.1% for Kendall's Tau, and 90.58% top-5 accuracy for fine-grained frame retrieval, which all surpass the existing best methods.…”
Section: Liquid Starts Exitingmentioning
confidence: 82%
“…representations to predict the action category. In contrast, many practical applications, e.g., sign language translation [4,5,13], robotic imitation learning [29,36], action alignment [6,21,23] and phase classification [16,27,37,50] require algorithms having ability to model long videos with hundreds of frames and extract frame-wise representations rather than the global features (Fig. 1).…”
Section: Liquid Starts Exitingmentioning
confidence: 99%
See 2 more Smart Citations
“…[7] uses global sequence alignment as a proxy task by relying on the DTW. [12,20] extended the DTW for end-to-end learning with differentiable approximations of the discrete operations (e.g., the 'min' operator) in the DTW. Chang et al [6] proposed the frame-wise alignment loss using the DTW in weakly supervised action alignment in videos.…”
Section: Related Workmentioning
confidence: 99%