Proceedings of the 28th ACM International Conference on Multimedia 2020
DOI: 10.1145/3394171.3413967
|View full text |Cite
|
Sign up to set email alerts
|

Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos

Abstract: Video moment retrieval aims to localize the target moment in an video according to the given sentence. The weak-supervised setting only provides the video-level sentence annotations during training. Most existing weak-supervised methods apply a MIL-based framework to develop inter-sample confrontment, but ignore the intra-sample confrontment between moments with semantically similar contents. Thus, these methods fail to distinguish the target moment from plausible negative moments. In this paper, we propose a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 61 publications
(27 citation statements)
references
References 42 publications
(96 reference statements)
0
27
0
Order By: Relevance
“…This is formally named as weakly supervised TSGV. The typical methods include WSDEC [14], TGA [43], WSLLN [17], SCN [34], Chen et al [12], VLANet [40], MARN [54], BAR [64], RTBPN [85], CCL [86], EC-SL [11], LoGAN [55] and CRM [26]. In general, weakly supervised methods for TSGV can be grouped into two categories (i.e., MIL-based and reconstruction-based).…”
Section: Weakly Supervised Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…This is formally named as weakly supervised TSGV. The typical methods include WSDEC [14], TGA [43], WSLLN [17], SCN [34], Chen et al [12], VLANet [40], MARN [54], BAR [64], RTBPN [85], CCL [86], EC-SL [11], LoGAN [55] and CRM [26]. In general, weakly supervised methods for TSGV can be grouped into two categories (i.e., MIL-based and reconstruction-based).…”
Section: Weakly Supervised Methodsmentioning
confidence: 99%
“…[86] design a counterfactual contrastive learning paradigm to improve the visual-and-language grounding tasks. A regularized two-branch proposal network (RTBPN) [85] is also presented to explore sufficient intra-sample confrontment with sharable two-branch proposal module for distinguishing the target moment from plausible negative moments.…”
Section: Weakly Supervised Methodsmentioning
confidence: 99%
“…Weakly-supervised temporal video grounding. To ease the human labelling efforts, several works (Bojanowski et al 2015;Mithun, Paul, and Roy-Chowdhury 2019;Lin et al 2020;Song et al 2020;Zhang et al 2020b;Ma et al 2020;Tan et al 2021) consider a weakly-supervised setting which only access the information of matched videoquery pairs without accurate segment boundaries. (Mithun, Paul, and Roy-Chowdhury 2019) utilize the dependency between video and sentence as the supervision while abandon the temporal ordered information.…”
Section: Language-based Semantic Miningmentioning
confidence: 99%
“…the video-sentence pairs without the temporal labels (i.e., start and end time). Zhang et al [33] developed a shareable two-branch framework that simultaneously took the inter-and intra-sample confrontation into account.…”
Section: A Temporal Sentence Groundingmentioning
confidence: 99%
“…• RTBPN [33]: The RTBPN method devises a shareable two-branch proposal framework to consider both the inter-and intra-sample confrontation.…”
Section: Performance Comparisonsmentioning
confidence: 99%