2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00835
|View full text |Cite
|
Sign up to set email alerts
|

Conservative Wasserstein Training for Pose Estimation

Abstract: This paper targets the task with discrete and periodic class labels (e.g., pose/orientation estimation) in the context of deep learning. The commonly used cross-entropy or regression loss is not well matched to this problem as they ignore the periodic nature of the labels and the class similarity, or assume labels are continuous value. We propose to incorporate inter-class correlations in a Wasserstein training framework by pre-defining (i.e., using arc length of a circle) or adaptively learning the ground met… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
8
1
1

Relationship

6
4

Authors

Journals

citations
Cited by 35 publications
(19 citation statements)
references
References 56 publications
0
19
0
Order By: Relevance
“…In this paper, we resort to the optimal transport distance as an alternative for empirical risk minimization [9,10,11,12]. With the low-cost modification of the loss function perspective, our solution can be added on any up-to-date general deep networks in a plug-and-play fashion.…”
Section: … …mentioning
confidence: 99%
“…In this paper, we resort to the optimal transport distance as an alternative for empirical risk minimization [9,10,11,12]. With the low-cost modification of the loss function perspective, our solution can be added on any up-to-date general deep networks in a plug-and-play fashion.…”
Section: … …mentioning
confidence: 99%
“…However, the previous works only consider the image domain, and the spatiotemporal FER does not significantly outperforms aggregation methods [2]. To the best of our knowledge, this is the first effort to investigate the compressed video FER, which is orthogonal to these advantages and can be easily added to each other [37], [38], [39], [40], [41]. Video compression Usually, the video codecs separates a video into several Group Of Pictures (GOP).…”
Section: Related Workmentioning
confidence: 99%
“…With the fast development of deep learning for recognition [11], [12], [13], [14], [15], [16], [17], [18], a hierarchical approach is developed for automatically interpreting depression based on the SDS assessment, its associated FE, and action video recording, among other things. To be more specific, we effectively extract the temporal information from each question-wise video by adjusting the 3D convolutional neural networks to the particular question (3D-CNN) [19].…”
Section: Introductionmentioning
confidence: 99%