2013
DOI: 10.1186/1687-6180-2013-173
|View full text |Cite
|
Sign up to set email alerts
|

Multi-modal highlight generation for sports videos using an information-theoretic excitability measure

Abstract: The ability to detect and organize 'hot spots' representing areas of excitement within video streams is a challenging research problem when techniques rely exclusively on video content. A generic method for sports video highlight selection is presented in this study which leverages both video/image structure as well as audio/speech properties. Processing begins where the video is partitioned into small segments and several multi-modal features are extracted from each segment. Excitability is computed based on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 15 publications
(4 citation statements)
references
References 27 publications
0
4
0
Order By: Relevance
“…The field of sports highlights detection or sports video summarization is extensively researched. Prior studies have used audio cues [1][2][3][4][5], visual cues [6], mixture of audio-visual cues [7][8], audio-textual cues [9], and a variety of features and classifiers to generate highlights [10]. Studies have also used textual cues from social media for generating sports highlights [11].…”
Section: Introductionmentioning
confidence: 99%
“…The field of sports highlights detection or sports video summarization is extensively researched. Prior studies have used audio cues [1][2][3][4][5], visual cues [6], mixture of audio-visual cues [7][8], audio-textual cues [9], and a variety of features and classifiers to generate highlights [10]. Studies have also used textual cues from social media for generating sports highlights [11].…”
Section: Introductionmentioning
confidence: 99%
“…The fundamental frequency of speech (F 0 ) is known to be affected by stress [19,20], emotions [19,21], and talking styles [22]. Different languages may exhibit unique F 0 characteristics [23] and the same may be observed also for individual dialects of a language [24].…”
Section: Fundamental Frequency Analysismentioning
confidence: 99%
“…Merler et al [30] proposed to use semantic model vectors, an intermediate level semantic representation, as a basis for modeling and detecting complex events in unconstrained real-world videos. Hasan et al [23] presented a generic video highlight generation scheme based on an information theoretic measure of user excitability. Liu et al [29] proposed to use action, scene and object concepts as semantic attributes for classification of video events.…”
mentioning
confidence: 99%