2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021
DOI: 10.1109/iccv48922.2021.01336
|View full text |Cite
|
Sign up to set email alerts
|

Learning Cross-Modal Contrastive Features for Video Domain Adaptation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
53
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 59 publications
(55 citation statements)
references
References 38 publications
2
53
0
Order By: Relevance
“…Here, we focus on video domain adaptation for activity recognition. State-of-the-art visual-only solutions learn to reduce the shift in activity appearance by adversarial training [5,6,8,9,20,27,29] and self-supervised learning techniques [9,22,27,34]. While Jamal et al [20] and Munro and Damen [27] directly penalize domain specific features with an adversarial loss at every time stamp, Chen et al [5], Choi et al [9] and Pan et al [29] attend to temporal segments that contain important cues.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Here, we focus on video domain adaptation for activity recognition. State-of-the-art visual-only solutions learn to reduce the shift in activity appearance by adversarial training [5,6,8,9,20,27,29] and self-supervised learning techniques [9,22,27,34]. While Jamal et al [20] and Munro and Damen [27] directly penalize domain specific features with an adversarial loss at every time stamp, Chen et al [5], Choi et al [9] and Pan et al [29] attend to temporal segments that contain important cues.…”
Section: Related Workmentioning
confidence: 99%
“…Self-supervised learning objectives are also incorporated in [27] and [9] to better align the features across domains by utilizing the correspondences between RGB and optical flow or the temporal order of video clips. Song et al [34] and Kim et al [22] obtain remarkable performance by contrastive learning for self-supervised learning to align the feature distributions between video domains. Instead of relying on the vision modality only, which may present large activity appearance variance, we consider the domain-invariant information within sound to help the model adapt to the visual distribution shift.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations