2001
DOI: 10.1016/s0167-8655(00)00119-7
|View full text |Cite
|
Sign up to set email alerts
|

Classification of general audio data for content-based retrieval

Abstract: In this paper, we address the problem of classi®cation of continuous general audio data (GAD) for content-based retrieval, and describe a scheme that is able to classify audio segments into seven categories consisting of silence, single speaker speech, music, environmental noise, multiple speakers' speech, simultaneous speech and music, and speech and noise. We studied a total of 143 classi®cation features for their discrimination capability. Our study shows that cepstralbased features such as the Mel-frequenc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
116
0
1

Year Published

2002
2002
2013
2013

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 215 publications
(118 citation statements)
references
References 16 publications
1
116
0
1
Order By: Relevance
“…This audio histogram was computed from a randomly selected 30 minutes segment of each movie's audio stream. [20] Silence ratio Proportion of silence in a time window [24] After removing silence, the remaining audio signals were classified by a SVM with a polynomial kernel, using the LIBSVM toolbox. (http://www.csie.ntu.edu.tw/~cjlin/libsvm/).…”
Section: Audio Featuresmentioning
confidence: 99%
“…This audio histogram was computed from a randomly selected 30 minutes segment of each movie's audio stream. [20] Silence ratio Proportion of silence in a time window [24] After removing silence, the remaining audio signals were classified by a SVM with a polynomial kernel, using the LIBSVM toolbox. (http://www.csie.ntu.edu.tw/~cjlin/libsvm/).…”
Section: Audio Featuresmentioning
confidence: 99%
“…A good overview of common extraction techniques is presented in [7]. Music content-based features may be low-level representations that stem directly from the audio signal, for example zero-crossing rate [18], amplitude envelope [5], bandwidth and band energy ratio [37], or spectral centroid [67]. Alternatively, audio-based features may be derived or aggregated from low-level properties, and therefore represent aspects on a higher level of music understanding.…”
Section: Examplesmentioning
confidence: 99%
“…Let us assume that this is represented by s(t) and the time duration for which the signal is processed is [t 1 , t 2 ]. We extract spectral shape features that are related to the audio content [7]. The cepstral coefficients (c) are defined as follows:…”
Section: Activity Inferencementioning
confidence: 99%
“…These "cepstral coefficients" help in differentiating between different classes of dynamic activities (e.g., "Cooking" and "Hygiene"), or different classes of static activities (e.g., "Meeting" and "Driving"). We use the first 12 cepstral coefficients as they are well known to be the most useful in describing the content of an audio signal [7]. Figure III shows an example of how the spectral shape, computed using cepstral coefficients, is significantly different between "Cooking" and "Driving".…”
Section: Activity Inferencementioning
confidence: 99%