Now-a-days, multimedia content analysis occupies an important place in widely used applications. It may depend on audio segmentation which is one of the many other tools used in this area. In this paper, we present an optimized audio classification and segmentation algorithms that are used to segment a superimposed audio stream according to its content into 10 main audio types: speech, non-speech, silence, male speech, female speech, music, environmental sounds, and music genres, such as classic music, jazz, and electronic music. We have tested the KNN, SVM, and GASOM algorithms on two audio classification systems. In the first audio classification system, the audio stream is discriminated into speech no-speech, purespeech/silence, male speech/female speech, and music/ environmental sounds. However, in the second audio classification system, the audio stream is segmented into music/speech, pure-speech/silence, male speech/female speech. For pure-speech/silence discrimination, it is performed in the two systems according to a rule-based classifier. Concerning the music segments in both systems, they are discriminated into different music genres using the decision tree as a classifier. Also, the first audio classification system has succeeded to achieve higher performances compared to the second one. Indeed, in the first system using the GASOM algorithm with leave-one-out validation technique, the average accuracy has reached 99.17% for the music/environmental sounds discrimination. Moreover, in both systems, the GASOM algorithm has always reached the best results of performances compared to KNN and SVM algorithms. Therefore, in the first system, the GASOM algorithm has been contributed to obtain an optimized consumption time compared to that one obtained using the two HMM and MLP methods.
Nowadays, the diseases of the voice increase because of bad social habits and the misuse of voice. These pathologies should be treated from the beginning. Indeed, it is no longer necessary that the diseases of the voice lead to affect the quality of the voice as heard by a listener. The most useful tool for diagnosing such diseases is the Acoustic analysis. We present in this work, new expression parameters in order to clarify the description of the vocal signal. These parameters help to classify the unhealthy voices. They describe essentially the fundamental frequency F0, the Harmonics-to-Noise report (HNR), the report Noise to Harmonics Ratio (NHR) and Detrended Fluctuation Analysis (DFA). The classification is performed on two Saarbruecken Voice and MEEI pathological databases using HTK classifiers. We can classify them into two different types: the first classification is binary which is used for the normal and pathological voices; the second one is called a four-category classification used in spasmodic, polyp, nodule and normal female voices and male speakers. And we studied the effects of these new parameters when combined with the MFCC, Delta, Delta second and Energy coefficients.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.