An enhanced video summarization system using audio features for a personal video recorder

Otsuka, I.; Radhakrishnan, Regunathan; Siracusa, Michael R.; Divakaran, Ajay; Mishima, H.

doi:10.1109/tce.2006.1605043

Cited by 21 publications

(8 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…[3] 'Pure-Music' is the traditional music program that consists of music and songs mainly and a few interviews with guests.…”

Section: Fig 8 An Evaluation Results Of Developed Prototype Modelmentioning

confidence: 99%

“…In our previous work [2] , we proposed a video browsing system using audio to detect sports highlights by identifying segments with a mixture of the commentator's excited speech and cheering, and also proposed to extend our strategy to music content by identifying music periods. [3] In this paper, we describe a combination of three methods for identifying music/songs as a 'segment' with high accuracy. First, we use audio classification by using the Gaussian Mixture Models.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Detection of music segment boundaries using audio-visual features for a personal video recorder

Otsuka

Suginohara

Kusunoki

et al. 2007

IEEE Trans. Consumer Electron.

Self Cite

View full text Add to dashboard Cite

We have extended our sports video browsing framework for personal video recorders, such as recordable-DVD recorders, blu-ray disc recorders and/or hard disc recorders, to music segment detection. Our extension to Japanese broadcast music video programs consists of detecting audio segment boundaries such as conversations with guests followed by music/song etc. Our proposed system first identifies the music/song scenes using audio analysis, and then adjusts the start/end position by detecting video shot changes, so as to achieve accurate detection of the music segment thus enabling rapid browsing. Our preliminary results indicate that our audio-only summarization with scene change support works well for music video content. We can therefore integrate the enhancement into our product at a low computational cost. IEEE Transactions on Consumer ElectronicsThis work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved.

show abstract

“…[3] 'Pure-Music' is the traditional music program that consists of music and songs mainly and a few interviews with guests.…”

Section: Fig 8 An Evaluation Results Of Developed Prototype Modelmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Detection of music segment boundaries using audio-visual features for a personal video recorder

Otsuka

Suginohara

Kusunoki

et al. 2007

IEEE Trans. Consumer Electron.

Self Cite

View full text Add to dashboard Cite

show abstract

“…One possible goal is to have a system that improves the interactive browsing experience by emphasizing important points in the content [11]. Another possible goal is to prepare a non-interactive summary of the content.…”

Section: Processing Objectivesmentioning

confidence: 99%

“…The most commercially successful application of video summarization to date is the work of Otsuka et al [11]. In this work, the goal is to find highlights in sports content.…”

Section: Existing Algorithmsmentioning

confidence: 99%

See 1 more Smart Citation

Broadcast Video Content Segmentation by Supervised Learning

Wilson¹,

Divakaran²

2008

Multimedia Content Analysis

View full text Add to dashboard Cite

Today's viewers of broadcast content are presented with huge amounts of content from broadcast networks, cable networks, pay-per-view, and more. Streaming video over the internet is beginning to add to this flow. Viewers do not have enough time to watch all of this content, and in many cases, even after selecting a few programs of interest, they many want to speed up their viewing of the chosen content, either by summarizing it or by providing tools to rapidly navigate to the most important parts. New display devices and new viewing environments, for example using a cell phone to watch content while riding the bus, will also increase the need for new video summarization and management tools. Video Summarization tools can vary substantially in their goals. For example, tools may seek to create a set of still-image keyframes, or they may create a condensed video skim [14]. Even after specifying the format of the summary, there can be different semantic objectives for the summary. A summary meant to best convey the plot of a situation comedy could differ substantially from a summary meant to show the funniest few scenes from the show. Most of these processing goals remain unachieved despite over a decade of work on video summarization. The fundamental reason for this difficulty is the existence of the "semantic gap", the large separation between computationally easy-to-extract audio and visual features and semantically meaningful items such as spoken words, visual objects, and elements of narrative structure. Because most video summarization goals are stated in semantic terms ("the most informative summary," "the most exciting plays of the match"), while our computational tools are best at extracting simple features like audio energy and color histograms, we must find some way to bridge these two domains. Springer Book on Content AnalysisThis work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved.

show abstract