The vast majority of commercially available isolated word recognizers use a filter bank analysis as the front end processing for recognition. It is not well understood how the parameters of different filter banks (eg., number of filters, types of filters, fiiter spacing, etc.) affect recognizer performance. In this paper we present results of performance evaluation of several types of filter bank analyzers in a speaker trained isolated word recognition test using dialed-up telephone line recordings. We have studied both DFT (discrete Fourier transform) and direct form implementations of the filter banks. We have also considered uniform and nonuniform filter spacings. The results indicate that the best performance (highest word accuracy) is obtained by both a 15-channel uniform filter bank and a 13-channel nonuniform fiiter bank (with channels spacing along a critical band scale). The performance of a 7-channel critical band fiiter bank is almost as good as that of the two best filter banks. In comparison to a conventional linear predictive coding (LPC) word recognizer, the performance of the best fdter bank recognizers was, on average, several percent worse than that of an eighth-order LPC-based recognizer. A dicussion as to why some filter banks performed better than others, and why the LPC-based system did the best, is given in this paper. S I. INTRODUCTION INCE the early 1970's, researchers have been working on building machines that have the ability to communicate with man in his natural method of communication. One research area that has developed from this work is that of speech recognition. The general goal of speech recognition is to understand normal human speech and then to be able to perform some task based on this understanding. This is a very natural goal in that it requires machines to adapt to humans rather than vice versa. In this way speech recognition would provide a convenient method of communication with machines (e.g., computers) via terminals and ordinary telephone handsets. Progress has been made toward the general goal of speech recognition by imposing some restrictions on the speech input. These restrictions are usually in the form of limits placed on the vocabulary, the set of allowable users, or the mode of the input. The purpose of this last limitation (probably the most severe one) is to restrict the form of input speech to a set of isolated word commands, instead of continuous speech, in order to achieve reliable recognition. With these restrictions speech recognition has made major strides forward in the past decade and several commerical systems have appeared [ 13-[6]. These systems are predominantly isolated word speaker-trained systems. The availability of these systems has led to an increased interest in the possibility of producing terminal equipment that uses this new technology.
The vast majority of commercially available isolated word recognizers use a filter bank analysis as the front end processing for recognition. It is not well understood how the parameters of different filter banks (e.g., number of filters, types of filters, filter spacing, etc.) affect recognizer performance. In this talk we present results of performance evaluation of several types of filter bank analyzer in a speaker trained, isolated word recognition test using dialed-up telephone line recordings. We have studied both DFT (discrete Fourier transform) and direct form implementations of the filter banks. We have also considered uniform and nonuniform filter spacings. The results indicate that the best performance (highest word accuracy) is obtained by both a 15-channel uniform filter bank and a 13-channel nonuniform filter bank (with channels spaced along a critical band scale). In comparison to a conventional linear predictive coding (LPC) word recognizer, the performance of the best filter bank recognizers was, on average, several percent worse than that of an eighth order LPC-based recognizer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.