This paper investigates a new front-end processing that aims at improving the performance of speech recognition in noisy mobile environments. This approach combines features based on conventional Mel-cepstral Coefficients (MFCCs), Line Spectral Frequencies (LSFs) and formant-like (FL) features to constitute robust multivariate feature vectors. The resulting front-end constitutes an alternative to the DSR-XAFE (XAFE: eXtended Audio FrontEnd) available in GSM mobile communications. Our results showed that for highly noisy speech, using the paradigm that combines these spectral cues leads to a significant improvement in recognition accuracy on the Aurora 2 task.
This paper addresses the realization of a Human/Machine (H/M) interface including a system for automatic recognition of the Continuous Pathological Speech (ARSCPS) and several communication tools in order to help frail people with speech problems (Dysarthric speech) to access services providing by new technologies of information and communication (TIC) while making it easier for the doctors to achieve a first diagnosis on the patient's disease. In addition, an ARSCPS has been improved and developed for normal and pathology voice while establishing a link with our graphic interface which is based on the box tools Hidden Markov Model Toolkit (HTK), in addition to the Hidden Models of Markov (HMM). In our work we used different techniques of feature extraction for the speech recognition system in order to improve the dysarthric speech intelligibility while developing an ARSCPS which can perform well for pathological and normal speakers. These techniques are based on the coefficients of ETSI standard Mel Frequency Cepstral Coefficient Front End (ETSI MFCC FE V2.0); Perceptual Linear Prediction coefficients (PLP); Mel Frequency Cepstral Coefficients (MFCC) and the recently proposed Power Normalized Cepstral Coefficients (PNCC) have been used as a basis for comparison. In this context we used the Nemours database which contains 11 speakers that represents dysarthric speech and 11 speakers that represents normal speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.