-Speech signal is redundant and non-stationary by nature. Because of vocal tract inertness these variations are not very rapid and the signal can be considered as stationary in short segments. It is presumed that in short-time magnitude spectrum the most distinct information of speech is contained. This is the main reason for speech signal analysis in frame-by-frame manner. The analyzed speech signal is segmented into overlapping segments (so-called frames) for this purpose. Segments of 15-25 ms with the overlap of 10-15 ms are used usually.In this paper we present results of our investigation of analysis window length and frame shift influence on speech recognition rate. We have analyzed three different cepstral analysis approaches for this purpose: mel frequency cepstral analysis (MFCC), linear prediction cepstral analysis (LPCC) and perceptual linear prediction cepstral analysis (PLPC). The highest speech recognition rate was obtained using 10 ms length analysis window with the frame shift varying from 7.5 to 10 ms (regardless of analysis type). The highest increase of recognition rate was 2.5 %.
General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.-Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
Paper presents visual features for speech recognition. Visual speech recognition can be applied as the support for the acoustical recognition process or as a standalone speech recognition approach in the case of absence of the acoustical data. Static geometrical features of the speaker's lips were used for recognition. Statistical analysis of the feature values enabled us to select individual feature for each analyzed phone and the combined feature set for all phones. The discriminative power was tested using Bayes classification rule. 4th order analysis enabled us to recognize phones with the average rate of 60.3 %, the highest recognition rate of 81.3 % was achieved identifying phone [o].
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.