2020
DOI: 10.1007/978-3-030-58604-1_10
|View full text |Cite
|
Sign up to set email alerts
|

VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
53
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 61 publications
(53 citation statements)
references
References 21 publications
0
53
0
Order By: Relevance
“…And comparing to VLANet, FSAN performs overall better, except for R@5. This may be due to the surrogate proposal selection module introduced in VLANet (Ma et al, 2020), which in fact performs a two-stage candidate selection and gets rid of temporally overlapped candidates.…”
Section: Comparisons With State-of-the-art Methodsmentioning
confidence: 99%
“…And comparing to VLANet, FSAN performs overall better, except for R@5. This may be due to the surrogate proposal selection module introduced in VLANet (Ma et al, 2020), which in fact performs a two-stage candidate selection and gets rid of temporally overlapped candidates.…”
Section: Comparisons With State-of-the-art Methodsmentioning
confidence: 99%
“…In the image domain, some works have dealt with image caption [43], image grounding [21], and text-to-image synthesis [6]. In the video domain, some works focus on temporal localization using natural language [32,35,51], where the temporal boundary needs to be localized with a given natural language description. The task of object tracking with natural language specification is similar, and the difference is to estimate the location of the interesting object marked with a bounding box.…”
Section: Textual and Visual Understandingmentioning
confidence: 99%
“…Mithun et al [2] first proposed weakly-supervised framework which performs SVMR without boundary annotations. The methods in [9,10] proposed to learn video-language alignment by embedding semantics into joint space. Although revolutionary ways of using weak supervision have been proposed, they do not fully explore weakly-supervised manner in a video corpus level.…”
Section: Video Moment Retrievalmentioning
confidence: 99%
“…As, in weakly-supervised manner, learning from boundary information is not available, generating multi-scale proposals and finding the most perti-nent one are important. Ma et al [10] contributes to selecting surrogate proposals in early stage. Although previous works [9,11] proposed efficient proposal selection, reducing redundant proposals and generating ones are still challenging.…”
Section: Video Proposal Generationmentioning
confidence: 99%