Proceedings of the 3rd International Workshop on Automated Information Extraction in Media Production 2010
DOI: 10.1145/1877850.1877862
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised event segmentation of news content with multimodal cues

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
8
0

Year Published

2011
2011
2016
2016

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 10 publications
0
8
0
Order By: Relevance
“…In the following we firstly discuss some of the existing content-based multimodal event detection methods (i.e., those combining the analysis of video, audio and eventually text) for the purpose of multimedia retrieval. In [4], events are extracted from news videos in a multimodal unsupervised fashion. This method combines information from audio, visual appearance, faces, and mid-level semantic concepts by applying coherence rules.…”
Section: Prior Artmentioning
confidence: 99%
See 1 more Smart Citation
“…In the following we firstly discuss some of the existing content-based multimodal event detection methods (i.e., those combining the analysis of video, audio and eventually text) for the purpose of multimedia retrieval. In [4], events are extracted from news videos in a multimodal unsupervised fashion. This method combines information from audio, visual appearance, faces, and mid-level semantic concepts by applying coherence rules.…”
Section: Prior Artmentioning
confidence: 99%
“…As in [26], in one of our event detection methods we also exploit correlations between different modalities, namely between motion sensor data and audio-content data. However, the works described in [4] and [26] analyze visual data (apart from other types of data) which is usually computationally expensive. The analysis methods that we propose are instead light-weight, as they mainly consider auxiliary sensor data sampled at low rates, as we will describe in the next sections of this paper.…”
Section: Prior Artmentioning
confidence: 99%
“…Several approaches have focused on the discovery of near-duplicate repetitions [6,7,8] but cannot deal with the crucial issue of variability across repetitions. Since structurally relevant events are often characterized by their strong visual consistency, the problem of mining repeating structural elements has been addressed using clustering techniques (see, e.g., [9,10,11,12,13,14]). But several problems arise from clustering, such as deciding the optimal number of clusters or dealing with outliers.…”
Section: Introductionmentioning
confidence: 99%
“…Starting from this independent analysis of the individual features of the content, we add a compositional mining step on the similarities found across video shots to detect patterns containing the same speaking person in the same environment. This algorithm builds on our early work on event detection [14] using the same clustering process but greatly extend the relevant potential introducing the differentiation of the role played by the people in the video better exploiting the time component of the detected anchorpersons. This time analysis expressed by appearance duration in video, by interval time between appearances and by number of appearance times of the anchorpersons results an innovative approach for unsupervised differentiation avoiding any training phase.…”
Section: Introductionmentioning
confidence: 99%
“…The Viola-Jones face detection algorithm is used to find faces locations inside the key-frames images [9]. Audio segments, key-frames and detected faces are then analyzed separately using different tuned clustering algorithms based on the method presented in [14]. With each independent analysis phase, it is possible to add a data mining step on the similarities found in the video to obtain patterns containing the same speaking person in the same environment.…”
Section: Introductionmentioning
confidence: 99%