2023
DOI: 10.48550/arxiv.2303.08345
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos

Abstract: Video temporal grounding aims to pinpoint a video segment that matches the query description. Despite the recent advance in short-form videos (e.g., in minutes), temporal grounding in long videos (e.g., in hours) is still at its early stage. To address this challenge, a common practice is to employ a sliding window, yet can be inefficient and inflexible due to the limited number of frames within the window. In this work, we propose an end-to-end framework for fast temporal grounding, which is able to model an … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 33 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?