Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium 1999
DOI: 10.1109/eurmic.1999.794763
|View full text |Cite
|
Sign up to set email alerts
|

A framework for audio analysis based on classification and temporal segmentation

Abstract: Existing audio tools handle the increasing amount of computer audio data inadequately. The typical taperecorder paradigm for audio interfaces is inflexible and time consuming, especially for large data sets. On the other hand, completely automatic audio analysis and annotation is impossible using current techniques.Alternative solutions are semi-automatic user interfaces that let users interact with sound in flexible ways based on content. This approach offers significant advantages over manual browsing, annot… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0
1

Year Published

2002
2002
2019
2019

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 28 publications
(19 citation statements)
references
References 11 publications
0
18
0
1
Order By: Relevance
“…Lamel et al [8] describes endpoint detector for isolated word segmentation, where a histogram of the low 10db of log energy levels are considered to estimate the background noise. Tzanetakis and Cook [9] describe a methodology for temporal segmentation using different features such as Spectral features, MFCCs, LPC coefficients and pitch. Jasmine et al [10] model silence and noise parameters by assuming that the first 200ms of the speech signal contain noise.…”
Section: Preprocessingmentioning
confidence: 99%
“…Lamel et al [8] describes endpoint detector for isolated word segmentation, where a histogram of the low 10db of log energy levels are considered to estimate the background noise. Tzanetakis and Cook [9] describe a methodology for temporal segmentation using different features such as Spectral features, MFCCs, LPC coefficients and pitch. Jasmine et al [10] model silence and noise parameters by assuming that the first 200ms of the speech signal contain noise.…”
Section: Preprocessingmentioning
confidence: 99%
“…The linear prediction coefficients can use the Levinson-Durbin recursion to solve the normal equations that arise from the least-squares formulation [8].…”
Section: Previously Used Featuresmentioning
confidence: 99%
“…This yields a model for frequency that is structurally equivalent to Shepard's decomposition of pitch, such that frequency is also decomposed as (2) where we again restrict and . Alternately, we can calculate chroma from a given frequency using (3) where denotes the greatest integer function. Thus, chroma is simply the fractional part of the base-2 logarithm of frequency.…”
Section: Chroma As a Cyclic Representation Of Frequencymentioning
confidence: 99%
“…We make use of a pattern recognition framework for audio streams, in which the signal is segmented into frames and each frame is described by a set of features. The complexity of the features used varies by application; some commonly used features are described in [3]. This feature-based approach has been applied to general sound classification [4], speech/music discrimination [5], and musical instrument identification [6].…”
mentioning
confidence: 99%