2019
DOI: 10.48550/arxiv.1910.02993
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels

Daniel Y. Fu,
Will Crichton,
James Hong
et al.

Abstract: Many real-world video analysis applications require the ability to identify domain-specific events in video, such as interviews and commercials in TV news broadcasts, or action sequences in film. Unfortunately, pre-trained models to detect all the events of interest in video may not exist, and training new models from scratch can be costly and labor-intensive. In this paper, we explore the utility of specifying new events in video in a more traditional manner: by writing queries that compose outputs of existin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
3
3

Relationship

3
3

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 36 publications
(87 reference statements)
0
6
0
Order By: Relevance
“…Video Analysis Tasks. We use video analysis as another driving task: video data is large and expensive to label, and modeling temporal dependencies is important for quality but introduces significant slowdowns in label model parameter recovery (Sala et al, 2019) a corpus of TV news, respectively (Fu et al, 2019;Int, 2018).…”
Section: Discussionmentioning
confidence: 99%
“…Video Analysis Tasks. We use video analysis as another driving task: video data is large and expensive to label, and modeling temporal dependencies is important for quality but introduces significant slowdowns in label model parameter recovery (Sala et al, 2019) a corpus of TV news, respectively (Fu et al, 2019;Int, 2018).…”
Section: Discussionmentioning
confidence: 99%
“…This is challenging in our setting for two reasons. First, generating LFs over unstructured data modalities (such as image and video) is an active area of research [27,54,65,67,76]. In contrast, defining predicates is straightforward over structured feature spaces such as text (e.g., string or pattern matchers), quantitative data (e.g., thresholds), or categorical data (e.g., checking presence of a topic) variables.…”
Section: Curating Structured Data and A Dev Setmentioning
confidence: 99%
“…Interview Detection (Interview): We use the dataset from [14] and use the dev/test splits from that work. We additionally use an additional 57 hours of unlabelled data as the train split.…”
Section: Extended Experimental Detailsmentioning
confidence: 99%
“…Consider training a deep learning model to detect interviews in TV news videos [14]. As shown in Figure 1, supervision sources to generate training labels can draw on indirect signals from closed caption transcripts (per-scene), bounding box movement between frames (per-window), and pixels in the background of each frame (per-frame).…”
Section: Introductionmentioning
confidence: 99%