Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1168
|View full text |Cite
|
Sign up to set email alerts
|

Localizing Moments in Video with Temporal Language

Abstract: Localizing moments in a longer video via natural language queries is a new, challenging task at the intersection of language and video understanding. Though moment localization with natural language is similar to other language and vision tasks like natural language object retrieval in images, moment localization offers an interesting opportunity to model temporal dependencies and reasoning in text.We propose a new model that explicitly reasons about different temporal segments in a video, and shows that tempo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
111
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 120 publications
(111 citation statements)
references
References 33 publications
0
111
0
Order By: Relevance
“…Early works study this task in constrained settings, including the fixed spatial prepositions [21,38], instruction videos [1,31,35] and ordering constraint [4,37]. Recently, unconstrained query-based moment retrieval has attracted a lot of attention [6,10,13,14,22,23,42]. These methods are mainly based on a sliding window framework, which first samples candidate moments and then ranks these moments.…”
Section: Query-based Moment Retrievalmentioning
confidence: 99%
See 3 more Smart Citations
“…Early works study this task in constrained settings, including the fixed spatial prepositions [21,38], instruction videos [1,31,35] and ordering constraint [4,37]. Recently, unconstrained query-based moment retrieval has attracted a lot of attention [6,10,13,14,22,23,42]. These methods are mainly based on a sliding window framework, which first samples candidate moments and then ranks these moments.…”
Section: Query-based Moment Retrievalmentioning
confidence: 99%
“…These methods are mainly based on a sliding window framework, which first samples candidate moments and then ranks these moments. Hendricks et al [13] propose a moment context network to integrate global and local video features for natural language retrieval, and the subsequent work [14] considers the temporal language by explicitly modeling the context structure of videos. Gao et al [10] develop a cross-modal temporal regression localizer to estimate the alignment scores of candidate moments and textual query, and then adjust the boundaries of high-score moments.…”
Section: Query-based Moment Retrievalmentioning
confidence: 99%
See 2 more Smart Citations
“…The goal is to retrieve a temporal segment from an untrimmed video based on an arbitrary text query. Recent work focuses on learning the mapping from visual segments to the input text (Hendricks et al, 2017;Gao et al, 2017a;Liu et al, 2018;Hendricks et al, 2018; Work done when the author was at Salesforce Research. † Corresponding author.…”
Section: Introductionmentioning
confidence: 99%