2011 IEEE International Conference on Multimedia and Expo 2011
DOI: 10.1109/icme.2011.6011951
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised mining of audiovisually consistent segments in videos with application to structure analysis

Abstract: In this paper, a multimodal event mining technique is proposed to discover repeating video segments exhibiting audio and visual consistency in a totally unsupervised manner. The mining strategy first exploits independent audio and visual cluster analysis to provide segments which are consistent in both their visual and audio modalities, thus likely corresponding to a unique underlying event. A subsequent modeling stage using discriminative models enables accurate detection of the underlying event throughout th… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
27
0

Year Published

2012
2012
2017
2017

Publication Types

Select...
2
2
1

Relationship

5
0

Authors

Journals

citations
Cited by 8 publications
(27 citation statements)
references
References 13 publications
0
27
0
Order By: Relevance
“…In this paper, we propose an extension to this method that makes it more robust to deal with variability and easier to extract multiple structural events. Compared to our previous work [1], the main contributions of this paper are summarized as follows:…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations
“…In this paper, we propose an extension to this method that makes it more robust to deal with variability and easier to extract multiple structural events. Compared to our previous work [1], the main contributions of this paper are summarized as follows:…”
Section: Introductionmentioning
confidence: 99%
“…The proposed measure is slightly different from the one used in [1], in that instead of using the original mutual information measure, which has 4 possible states (correlations) for two input binary variables, we use only two positive correlations (see section 3.1 for more details).…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Unsupervised approaches for program segmentation were also addressed recently, where audiovisual consistency [5] and clustering-based methods [15] are considered. In particular, [5] proposed a multimodal event mining technique to discover repeating video segments exhibiting audio and visual consistency, and [15] clustered the keyframes based on a statistical distance of Pearson's correlation coefficient to detect anchorperson shots. However, these approaches are not enough practical for librarians, because they are either highly supervised or too specific to a particular type of programs.…”
Section: Introductionmentioning
confidence: 99%