Proceedings of the 1st ACM International Conference on Multimedia Information Retrieval 2008
DOI: 10.1145/1460096.1460116
|View full text |Cite
|
Sign up to set email alerts
|

On enabling techniques for personal audio content management

Abstract: State-of-the-art automatic analysis tools for personal audio content management are discussed in this paper. Our main target is to create a system, which has several co-operating management tools for audio database and which improve the results of each other. Bayesian networks based audio classification algorithm provides classification into four main audio classes (silence, speech, music, and noise) and serves as a first step for other subsequent analysis tools. For speech analysis we propose an improved Baye… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2009
2009
2012
2012

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 10 publications
0
3
0
Order By: Relevance
“…Our audio classification is based on the Mel frequency cepstral coefficient (MFCC) feature-a spectral feature used in various areas of speech analysis and in audio classification, due to its good capability to model fast and low variations of speech signal [22]. As the previous experiences of the authors show [17,23], MFCC allows to distinguish between various audio classes with fairly similar accuracy, while classification accuracy in case of using temporal and spectral audio features (e.g. spectrum centroid, low energy ratio) is often more classdependent and the overall result is not guaranteed to be much better than with MFCC.…”
Section: Audio-based Event Detection and Audio Classificationmentioning
confidence: 99%
“…Our audio classification is based on the Mel frequency cepstral coefficient (MFCC) feature-a spectral feature used in various areas of speech analysis and in audio classification, due to its good capability to model fast and low variations of speech signal [22]. As the previous experiences of the authors show [17,23], MFCC allows to distinguish between various audio classes with fairly similar accuracy, while classification accuracy in case of using temporal and spectral audio features (e.g. spectrum centroid, low energy ratio) is often more classdependent and the overall result is not guaranteed to be much better than with MFCC.…”
Section: Audio-based Event Detection and Audio Classificationmentioning
confidence: 99%
“…More complex audio features such as identifying who was speaking, speaker emotion, and voiced/unvoiced segments were plausible, and potentially useful, additions to TAFE. However, the required machine learning training and a priori knowledge of different conditions [56] may have made configuring TAFE correctly more difficult for Peter. In turn, SpEx may become inaccurate for Jack and Amy.…”
Section: Architecture Of Tafementioning
confidence: 99%
“…More complex audio features such as identifying who was speaking, speaker emotion, and voiced/unvoiced segments were plausible, and potentially useful, additions to TAFE. However, the required machine learning training and a priori knowledge of different conditions [56] may have made configuring TAFE correctly more difficult for Peter. In turn, SpEx may become inaccurate for Jack and Amy.…”
Section: Architecture Of Tafementioning
confidence: 99%