This paper proposes a rapid environment adaptation algorithm based on spectrum equalization (REALISE). In practical speech recognition applications, differences between training and testing environments often seriously diminish recognition accuracy. These environmental differences can be classified into two types: difference in additive noise and difference in multiplicative noise in the spectral domain. The proposed method calculates time-alignment between a testing utterance and the closest reference pattern to it, and then calculates the noise differences between the two according to the timealignment. Then, we adapt all reference patterns to the testing environment using the differences. Finally, the testing utterance is recognized using the adapted reference patterns. In a 250 Japanese word recognition task, in which the training and testing microphones were of two different types, REALISE improved recognition accuracy from 87% to 96%.
A new speaker adaptation method is described. In practical applications of speaker adaptation, adaptation and testing environments change signicantly and are unknown beforehand. In such cases, since the speaker adaptation adapts a reference pattern to the adaptation utterances in regard to dierences in both environment and speaker at the same time, performance in speaker adaptation would be degraded. To cope with this problem, our proposed method rst eliminates the environmental dierences between each input utterance and a reference pattern by using a rapid environment adaptation algorithm based on spectrum equalization (RE-ALISE) [2]. Then we apply an unsupervised and incremental speaker adaptation with autonomous control using tree structure pdfs (ACTS) [1] to the environmentally adapted reference pattern. By combining these two methods, the resulting system is expected to perform well under adverse environmental conditions and to show a stable improvement regardless of the amount of adaptation data. Evaluation experiments were carried out for utterances under three vehicle speed conditions. Recognition rates for a 100-Japanese-word recognition task after 100-word adaptation were improved from 92% (ACTS alone) to 95% (proposed method).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.