2009 IEEE International Conference on Acoustics, Speech and Signal Processing 2009
DOI: 10.1109/icassp.2009.4960393
|View full text |Cite
|
Sign up to set email alerts
|

Video event detection and summarization using audio, visual and text saliency

Abstract: Detection of perceptually important video events is formulated here on the basis of saliency models for the audio, visual and textual information conveyed in a video stream. Audio saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color and motion. Text saliency is extracted from part-of-speech tagging on the subtitles information avai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
40
0
2

Year Published

2010
2010
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 63 publications
(42 citation statements)
references
References 10 publications
0
40
0
2
Order By: Relevance
“…Finally, the selected clips are tailored together using overlapadd (fade-in fade-out) for both the audio and visual streams. More details are provided in [13].…”
Section: Video Summarizationmentioning
confidence: 99%
See 1 more Smart Citation
“…Finally, the selected clips are tailored together using overlapadd (fade-in fade-out) for both the audio and visual streams. More details are provided in [13].…”
Section: Video Summarizationmentioning
confidence: 99%
“…Computational models of single-and multimodal saliency [10]- [13] have been applied to emerging multimedia applications such as automatic video abstraction, summarization, indexing and browsing [14], [15]. Summarization refers to producing a shorter, in duration, version of a video that contains essential information for content understanding, without sacrificing much of the original's informative, functional or aesthetical purpose.…”
Section: Introductionmentioning
confidence: 99%
“…They use motion, face, and camera attention along with audio attention models (audio saliency and speech/music) as cues to capture salient information and identify the audio and video segments to compose the summary. In a similar fashion, Evangelopoulos et al [42,43] and Rapantzikos et al [44] fuse audio, visual, and text saliency measures into a single attention curve and select prominent parts of this measure to generate a summary.…”
Section: Related Workmentioning
confidence: 99%
“…They use motion, face, and camera attention along with audio attention models (audio saliency and speech/music) as cues to capture salient information and identify the audio and video segments to compose the summary. Rapantzikos et al (Evangelopoulos et al 2008(Evangelopoulos et al , 2009) build further on visual, audio, and textual attention models for visual summarization. The authors form a multimodal saliency curve integrating the aural, visual, and textual streams of videos based on efficient audio, image, and language processing and employ it as a metric for video event detection and abstraction.…”
Section: Novelty Detection and Video Summarizationmentioning
confidence: 99%