At present, pathological voice recognition is mainly based on the classification of pathological voice. However, almost all the researches are based on the single vowel \a\ samples, but few on multivowels. In addition, the current researches on multi-vowels recognition are mainly for normal voices, which are unsuitable for the speech recognition of normal and pathological multi-vowels simultaneously. This paper concentrates on developing an accurate and robust feature called enhanced-bark line spectrum pair (E-BLSP) to detect and classify normal and pathological multi-vowels. We explore the impact of E-BLSP feature on recognition performance and propose an effective method based on the combination of three features including E-BLSP for pathological and normal multi-vowels. In this paper, first LSP and difference of adjacent LSP (DAL) features of a vowel are extracted. Then LSP feature is warped at bark domain to get bark line spectrum pair (BLSP). In addition, then E-BLSP feature is calculated by adjusting BLSP using DAL feature. Finally, the adjusted E-BLSP feature and other two traditional features, including linear prediction cepstrum coefficient (LPCC) and mel-frequency cepstrum coefficients (MFCC) are applied to support vector machine (SVM) and deep neural network (DNN) classifiers to explore the classification performance of single feature and feature combinations for pathological and normal vowels /a/, /i/ and /u/. The results show that the highest achieved accuracies for DNN and SVM network are 98.6190% and 96.2693%, while the largest achieved area under curves (AUC) are 0.9925 and 0.9868, correspondingly with the combination of three features including LPCC, MFCC, and E-BLSP.
This paper presents a unified speech enhancement system to remove both background noise and interfering speech in serious noise environments by jointly utilizing the parabolic reflector model and neural beamformer. First, the amplification property of paraboloid is discussed, which significantly improves the Signal-to-Noise Ratio (SNR) of a desired signal. Therefore, an appropriate paraboloid channel is analyzed and designed through the boundary element method. On the other hand, a time-frequency masking approach and a mask-based beamforming approach are discussed and incorporated in an enhancement system. It is worth noticing that signals provided by the paraboloid and the beamformer are exactly complementary. Finally, these signals are employed in a learning-based fusion framework to further improve the system performance in low SNR environments. Experiments demonstrate that our system is effective and robust in five different noisy conditions (speech interfered with factory, pink, destroyer engine, volvo, and babble noise), as well as in different noise levels. Compared with the original noisy speech, significant average objective metrics improvements are about Δ STOI = 0.28, Δ PESQ = 1.31, Δ fwSegSNR = 11.9.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.