2011
DOI: 10.1007/s11042-011-0923-x
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal genre classification of TV programs and YouTube videos

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
30
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(31 citation statements)
references
References 16 publications
1
30
0
Order By: Relevance
“…4 In contrast to the previous audio sets, the RAI video dataset allows for the development and comparison of approaches using different modalities. Recent works demonstrate a tendency toward multimodal approaches [11,37,38]. The results achieved by our approach indicate the outstanding performance of the selected audio features in terms of F1-score of 95.80% despite the use of single modality only.…”
Section: Comparison To Related Work On Media Classificationmentioning
confidence: 68%
See 2 more Smart Citations
“…4 In contrast to the previous audio sets, the RAI video dataset allows for the development and comparison of approaches using different modalities. Recent works demonstrate a tendency toward multimodal approaches [11,37,38]. The results achieved by our approach indicate the outstanding performance of the selected audio features in terms of F1-score of 95.80% despite the use of single modality only.…”
Section: Comparison To Related Work On Media Classificationmentioning
confidence: 68%
“…The results achieved by our approach indicate the outstanding performance of the selected audio features in terms of F1-score of 95.80% despite the use of single modality only. Additionally, our approach demonstrates strong competitiveness to the top reported performance by Ekenel and Semela [11]. In addition to some acoustic features, Ekenel and Semena consider visual, structural, and cognitive features.…”
Section: Comparison To Related Work On Media Classificationmentioning
confidence: 76%
See 1 more Smart Citation
“…In Ref. 12, for example, the authors present an automatic video genre classification system for classifying the types of TV programs, based on several low level audio-visual features, such as, color, texture, signal energy and mel-frequency cepstral coefficients, as well as cognitive and structural information derived from faces and video shots. The proposed system was evaluated using TV programs from Italian and French TV channels, achieving an overall accuracy as high as 94.5%.…”
Section: Introductionmentioning
confidence: 99%
“…It is not surprising then that conventional uni-modal human-machine interactions lag in performance, robustness and naturalness when compared with human-human interactions. Recently, there has been increasing research interest in jointly processing information in multiple modalities and mimicking human-human multimodal interactions [2,4,5,9,13,14,16,18,19,21,22]. For example, human speech production and perception are bimodal in nature: visual cues have a broad influence on perceived auditory stimuli [17].…”
mentioning
confidence: 99%