The 2nd IEEE Internatioal Workshop on Haptic, Audio and Visual Environments and Their Applications, 2003. HAVE 2003. Proceeding
DOI: 10.1109/have.2003.1244723
|View full text |Cite
|
Sign up to set email alerts
|

Pitch-based feature extraction for audio classification

Abstract: This paper proposes a new algorithm to discriminate between speech and non-speech audio segments. It is intended for security applications as well as talker location identijcation in audio conferencing systems, equipped with microphone arrays.The proposed method is based on splitting the audio segment into small frames and detecting the presence of pitch in each one of them. The ratio of frames with pitch detected to the total number of frames is dejned as the pitch ratio and is used as the main feature to cla… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 10 publications
0
5
0
Order By: Relevance
“…This is due to the fact that most of the features found in the literature and used here, are quite good in discriminating speech and non-speech material, but not music from garbage. We introduced pitch related features [11], in order to achieve better separation between these two classes. Still, more work has to be done in order to find pitch statistics that can better discriminate between music and garbage.…”
Section: F1 -F10mentioning
confidence: 99%
“…This is due to the fact that most of the features found in the literature and used here, are quite good in discriminating speech and non-speech material, but not music from garbage. We introduced pitch related features [11], in order to achieve better separation between these two classes. Still, more work has to be done in order to find pitch statistics that can better discriminate between music and garbage.…”
Section: F1 -F10mentioning
confidence: 99%
“…From the figure, we can discover the multi-modal characteristics of the selected features which could be successfully described by the GMM. Finally, the feature vector of five elements is constructed by supplementing the pitch from the open-loop pitch estimation part in the SMV, which is widely adopted in the conventional speech/music classification methods [10]. Using speech and music models trained on the established feature vector, the input frame is finally classified into either speech or music based on the likelihood ratio (LR) test such that (10) where is a threshold value and is the frame index.…”
Section: Features Of the Speech/music Classification Algorithmmentioning
confidence: 99%
“…Finally, the feature vector of five elements is constructed by supplementing the pitch from the open-loop pitch estimation part in the SMV, which is widely adopted in the conventional speech/music classification methods [10]. Using speech and music models trained on the established feature vector, the input frame is finally classified into either speech or music based on the likelihood ratio (LR) test such that (10) where is a threshold value and is the frame index. In our approach, a smoothed LR is incorporated to prevent abrupt changes in the observed LR on the current frame for robust speech/music classification as follows:…”
Section: Features Of the Speech/music Classification Algorithmmentioning
confidence: 99%
“…The framework can also be used for determining sit-to-stand transition for home monitoring [150,151]. To combat pandemic including COVID-19, the framework has applications in extracting audio features for detecting dry or wet coughs [152,153].…”
Section: Discussionmentioning
confidence: 99%
“…The proposed framework can also find its applications in a pandemic such as the COVID-19. Besides cough detection [152,153], it can be used for computer-aided diagnosis from medical images (X-ray or CT) [172,173] and lowering the risk of infection for medical professionals [174].…”
Section: Discussionmentioning
confidence: 99%