2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00279
|View full text |Cite
|
Sign up to set email alerts
|

Interventional Video Grounding with Dual Contrastive Learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
50
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 128 publications
(50 citation statements)
references
References 63 publications
0
50
0
Order By: Relevance
“…In particular, SeqPAN [110] introduces the concepts of named entity recognition [122]- [124] in NLP by splitting snippet sequence into begin, inside, and end regions of target moment, and background region. IVG-DCL [112] introduces dual contrastive learning mechanism to enhance multimodal interaction and leverages structured causal model [125] to address the selection bias of TSGV. CI-MHA [114] proposes to remedy the start/end prediction noise caused by annotator disagreement via an auxiliary moment segmentation task.…”
Section: Span-based Methodsmentioning
confidence: 99%
“…In particular, SeqPAN [110] introduces the concepts of named entity recognition [122]- [124] in NLP by splitting snippet sequence into begin, inside, and end regions of target moment, and background region. IVG-DCL [112] introduces dual contrastive learning mechanism to enhance multimodal interaction and leverages structured causal model [125] to address the selection bias of TSGV. CI-MHA [114] proposes to remedy the start/end prediction noise caused by annotator disagreement via an auxiliary moment segmentation task.…”
Section: Span-based Methodsmentioning
confidence: 99%
“…To remove the harmful confounding effects, we develop a deconfounding method, DCM, that first disentangles moment representation to learn the core feature of visual content and then intervenes the multimodal input based on backdoor adjustment. This is the first causality-based work that addresses the temporal location biases of VMR, which is significantly different from a new VMR work [21].…”
Section: Related Workmentioning
confidence: 96%
“…These works utilize data augmentation strategies to construct semantic related example pairs based on original data (e.g., random cropping and rotation of images [3]). Besides, contrastive learning has also been adopted to fuse multi-view information, such as image and text [19,29,33], text and graph [9]. It drives the representations of related information in different types to be similar, such that each kind of representations can be enhanced by each other.…”
Section: Contrastive Learningmentioning
confidence: 99%