2004 IEEE International Conference on Acoustics, Speech, and Signal Processing
DOI: 10.1109/icassp.2004.1326724
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting multiple modalities for interactive video retrieval

Abstract: Aural and visual cues can be automatically extracted from video and used to index its contents. This paper explores the relative merits of the cues extracted from the different modalities for locating relevant shots in video, specifically reporting on the indexing and interface strategies used to retrieve information from the Video TREC 2002 and 2003 data sets, and the evaluation of the interactive search runs. For the documentary and news material in these sets, automated speech recognition produces rich text… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(25 citation statements)
references
References 3 publications
0
25
0
Order By: Relevance
“…The output is a set of semantic fragments and metadata describing the fragments and the relationships between these fragments. In [1], for example, automatic semantic segmentation is used to divide a large media database into fragments that can be used in computer-aided search. Content analysis uses a variety of information sources such as chromaticity, lighting, contrast, human motion, and camera motion, but also background music, background noise, and automatic speech recognition (ASR) to extract meaningful semantic information from the available content.…”
Section: Video Summarization Personalization and Interactivitymentioning
confidence: 99%
“…The output is a set of semantic fragments and metadata describing the fragments and the relationships between these fragments. In [1], for example, automatic semantic segmentation is used to divide a large media database into fragments that can be used in computer-aided search. Content analysis uses a variety of information sources such as chromaticity, lighting, contrast, human motion, and camera motion, but also background music, background noise, and automatic speech recognition (ASR) to extract meaningful semantic information from the available content.…”
Section: Video Summarization Personalization and Interactivitymentioning
confidence: 99%
“…As visible in Table 1, some target clips showed quickly changing actions (e.g. tasks 1,3,6,7,8), only a few tasks -in particular 2 and 10 -showed scenes of longer duration, which are more distinct but proved hard to find.…”
Section: Expert Runmentioning
confidence: 99%
“…Instead of pursuing rather small improvements in the field of content-based indexing and retrieval, video search tools should aim at better integration of the human into the search process, focusing on interactive video retrieval [8,9,18,19] rather than automatic querying.…”
Section: Introductionmentioning
confidence: 99%
“…1) The Informedia interface [31], [70]: This interface supports filtering based on visual semantic concepts. The visual concept filters are applied after a keyword-based search is carried out.…”
Section: ) Query By Abstractmentioning
confidence: 99%