2019
DOI: 10.48550/arxiv.1907.12763
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Finding Moments in Video Collections Using Natural Language

Abstract: In this paper, we introduce the task of retrieving relevant video moments from a large corpus of untrimmed, unsegmented videos given a natural language query. Our task poses unique challenges as a system must efficiently identify both the relevant videos and localize the relevant moments in the videos. This task is in contrast to prior work that localizes relevant moments in a single video or searches a large collection of already-segmented videos. For our task, we introduce Clip Alignment with Language (CAL),… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
63
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 22 publications
(63 citation statements)
references
References 37 publications
0
63
0
Order By: Relevance
“…DiDeMo [4], ActivityNet Captions [32], and Charades-STA [17] are among the benchmarks for this task. Finally, VCMR came even more recently from [15] and [37] which extends the searching scope of SVMR to the entire video corpus.…”
Section: Appendix 61 Related Workmentioning
confidence: 99%
“…DiDeMo [4], ActivityNet Captions [32], and Charades-STA [17] are among the benchmarks for this task. Finally, VCMR came even more recently from [15] and [37] which extends the searching scope of SVMR to the entire video corpus.…”
Section: Appendix 61 Related Workmentioning
confidence: 99%
“…4.2.1 Large-scale video corpus moment retrieval. Large-scale video corpus moment retrieval (VCMR) is a research direction extended from TSGV that has been explored over the past few years [15,32,77,79]. It has more application value since it can retrieve the target segment semantically corresponding to a given text query from a large-scale video corpus (i.e., a collection of untrimmed and unsegmented videos) rather than from a single video.…”
Section: Promising Research Directionsmentioning
confidence: 99%
“…Escorcia et al [15] first extend TSGV to VCMR, introducing a model named Clip Alignment with Language (CAL) to align the query feature with a sequence of uniformly partitioned clips for moment composing. Lei et al [32] introduce a new dataset for VCMR called TVR, which is comprised of videos and their associated subtitle texts.…”
Section: Promising Research Directionsmentioning
confidence: 99%
“…Video Corpus Moment Retrieval (VCMR) [1] is a new video-text retrieval task which aims to retrieve the most relevant moments from a large video corpus instead of from a single video. The text-based VCMR can be decomposed into two sub-tasks: video retrieval (VR) and single video moment retrieval (SVMR).…”
Section: Introductionmentioning
confidence: 99%