Spatio-temporal attention model for video content analysis

Guironnet, Mickael; Guyader, Nathalie; Pellerin, Denis; Ladret, Patricia

doi:10.1109/icip.2005.1530602

Cited by 11 publications

(10 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…You et al inherited the ideology in [4] to propose a human perception analysis framework for video understanding based on multiple visual cues [5]. In [6], [7] the authors constructed the visual attention models and applied their models to the application of user focus detection in video frames. The attention modeling based video content analysis work is proved more consistent with human understanding and has less computational complexity.…”

Section: (A) (B) (C) Fig 1 Talk Show Video Examplesmentioning

confidence: 99%

“…Although they have the above advantages compared with traditional work, the current techniques on attention modeling based highlight extraction are mainly focused on the analysis of visual aspect but neglect the aural modality [5][6] [7], which is another important intrinsic information source of video. Besides, the highlights are usually simply determined as the local maximums of the linear fused attention curve [4], which didn't consider the highlight asynchronous attention influence factors such as applaud and cheer in their work.…”

Section: (A) (B) (C) Fig 1 Talk Show Video Examplesmentioning

confidence: 99%

“…All the abovementioned related work on human attention modeling either ignores the aural dimension or models this dimension comparatively simple [4][5][6] [7]. The audio-track usually contains immense amounts of useful information and it normally has closer link to semantic event than the visual information.…”

Section: Aural Attention Modelingmentioning

confidence: 99%

See 2 more Smart Citations

Visual-aural attention modeling for talk show video highlight detection

Zheng¹,

Zhu

Jiang

et al. 2008

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

In this paper, we propose a visual-aural attention modeling based video content analysis approach, which can be used to automatically detect the highlights of the popular TV program-talk show video. First, the visual and aural affective features are extracted to represent and model the human attention of highlight. For efficiency consideration, the adopted affective features are kept as few as possible. Then, a specific fusion strategy called ordinaldecision is used to combine the visual, aural attention models and form the attention curve for a video. This curve can reflect the change of human attention while watching TV. Finally, highlight segments are located at the peaks of the attention curve. Moreover, sentence boundary detection is used to refine the highlight boundaries in order to keep the segments' integrality and fluency. This framework is extensible and flexible in integrating more affective features with a variety of fusion schemes. Experimental results demonstrate our proposed visual-aural attention analysis approach is effective for talk show video highlight detection.

show abstract

Section: (A) (B) (C) Fig 1 Talk Show Video Examplesmentioning

confidence: 99%

Section: (A) (B) (C) Fig 1 Talk Show Video Examplesmentioning

confidence: 99%

See 1 more Smart Citation

Visual-aural attention modeling for talk show video highlight detection

Zheng¹,

Zhu

Jiang

et al. 2008

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

View full text Add to dashboard Cite

show abstract

“…To find regions of interest needs other methods such as those proposed in [5][6][7][8]. Zhai and Shah [5] and Guironnet et al [6] utilized static and motion information as spatial and temporal factors to obtain the attended areas. Liu and Gleicher [7] also analyzed image and motion saliency and applied it for retargeting video to small screens.…”

Section: Introductionmentioning

confidence: 99%

Spatial-temporal attention analysis for home video

Qiu

Jiang²,

Liu

et al. 2008

2008 IEEE International Conference on Multimedia and Expo

View full text Add to dashboard Cite

In this paper, by considering the multiple spatial-temporal characteristic of visual perception system, we propose a novel home video attention analysis method. Firstly, each frame of the video is segmented into regions which are more informative than pixels and image blocks. Then the saliency of each region is analyzed by combining static, motion and location attentions. Finally a region based saliency map is generated for each frame, and an attention score curve is obtained for the video clip by combining attention scores of all regions in each frame. Both of them can be utilized in wide applications. This method takes advantage of the properties of human visual perception and can well present the attention information of home videos. Experimental results show the effectiveness of this approach.Index Terms-visual attention, video analysis, attention score curve, saliency map

show abstract

“…7 Itti and Koch, 8 then defined a visual attention system based on saliency maps to predict visually salient features of a scene. Chauvin et al 9 and Guironnet et al 10 proposed models inspired by the retina and the primary visual cortex cell functionalities. Corchs and Deco 11 implemented a neurodynamical model for visual attention, based on evidence from functional, neurophysiological, and psychological findings.…”

Section: Neurodynamical Model Of Visual Attentionmentioning

confidence: 99%

Low-quality image enhancement using visual attention

2007

View full text Add to dashboard Cite

Abstract. Low quality images are often corrupted by artifacts and generally need to be heavily processed to become visually pleasing. We present a modified version of unsharp masking that is able to perform image smoothing, while not only preserving but also enhancing the salient details in images. The premise supporting the work is that biological vision and image reproduction share common principles. The key idea is to process the image locally according to topographic maps obtained from a neurodynamical model of visual attention. In this way, the unsharp masking algorithm becomes local and adaptive, enhancing the edges differently according to human perception.

show abstract

Spatio-temporal attention model for video content analysis

Cited by 11 publications

References 9 publications

Visual-aural attention modeling for talk show video highlight detection

Visual-aural attention modeling for talk show video highlight detection

Spatial-temporal attention analysis for home video

Low-quality image enhancement using visual attention

Contact Info

Product

Resources

About