Multimodal Processing and Interaction 2008
DOI: 10.1007/978-0-387-76316-3_8
|View full text |Cite
|
Sign up to set email alerts
|

Audiovisual Attention Modeling and Salient Event Detection

Abstract: Although human perception appears to be automatic and unconscious, complex sensory mechanisms exist that form the preattentive component of understanding and lead to awareness. Considerable research has been carried out into these preattentive mechanisms and computational models have been developed for similar problems in the fields of computer vision and speech analysis. The focus here is to explore aural and visual information in video streams for modeling attention and detecting salient events. The separate… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2008
2008
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 20 publications
(13 citation statements)
references
References 35 publications
0
13
0
Order By: Relevance
“…where a N (t) is the amplitude,  N (t) is the frequency, and N is the size of the audio sequence. The audio attention model is based on three features: the maximum Teager energy, M TE : the mean instant amplitude, M IA : and the mean instant frequency, M IF [47]. The first, M TE captures the joint amplitude-frequency information of the audio activity, which represents the dominant signal modulation energy.…”
Section: Audio Visual Attention Modelingmentioning
confidence: 99%
“…where a N (t) is the amplitude,  N (t) is the frequency, and N is the size of the audio sequence. The audio attention model is based on three features: the maximum Teager energy, M TE : the mean instant amplitude, M IA : and the mean instant frequency, M IF [47]. The first, M TE captures the joint amplitude-frequency information of the audio activity, which represents the dominant signal modulation energy.…”
Section: Audio Visual Attention Modelingmentioning
confidence: 99%
“…The system is based on a modulation model for speech signals motivated by physical observations during speech production [18], the microproperties of speech signals, and a detection-theoretic optimality criterion. The features involved in the decision process have been previously used with success for speech endpoint detection in isolated word and sentences, VAD in large-scale databases and audio saliency modeling [19]. Moreover the developed VAD, based on divergence measures has been systematically compared in [17] with recent, high detection rate VAD [16], which in turn was evaluated against common standards.…”
Section: Audio Activity Detectionmentioning
confidence: 99%
“…In addition to classic ones such as indexing and summarization, applications focused more on higher level video understanding [1,2] have demonstrated significant promise. In the domain of movie content processing, various tasks such as narrative act structure characterization, violent scene detection and saliency prediction [3,4] for regions of potential greater engagement are some examples of interesting applications. Many of these methods help to analyze movie datasets at scale making it easier for human experts to perform higher level analytics and decision making.…”
Section: Introductionmentioning
confidence: 99%