This paper focuses on the adaptation of Automatic Speech Recognition systems using Hybrid models combining Artificial Neural Networks (ANN) with Hidden Markov Models (HMM). Most adaptation techniques for ANNs reported in the literature consist in adding a linear transformation network connected to the input of the ANN. This paper describes the application of linear transformations not only to the input features, but also to the outputs of the internal layers. The motivation is that the outputs of an internal layer represent discriminative features of the input pattern suitable for the classification performed at the output of the ANN.-2-In order to reduce the effect due to the lack of adaptation samples for some phonetic units we propose a new solution, called Conservative Training. Supervised adaptation experiments with different corpora and for different types of adaptation are described. The results show that the proposed approach always outperforms the use of transformations in the feature space and yields even better results when combined with linear input transformations.
A technique is proposed for the adaptation of automatic speech recognition systems using Hybrid models combining Artificial Neural Networks with Hidden Markov Models. The application of linear transformations not only to the input features, but also to the outputs of the internal layers is investigated. The motivation is that the outputs of an internal layer represent a projection of the input pattern into a space where it should be easier to learn the classification or transformation expected at the output of the network. A new solution, called Conservative Training, is proposed that compensates for the lack of adaptation samples in certain classes. Supervised adaptation experiments with different corpora and for different adaptation types are described. The results show that the proposed approach always outperforms the use of transformations in the feature space and yields even better results when combined with linear input transformations.
In this paper, we apply Semi-Supervised Learning (SSL) along with Data Augmentation (DA) for improving the accuracy of End-to-End ASR. We focus on the consistency regularization principle, which has been successfully applied to image classification tasks, and present sequence-to-sequence (seq2seq) versions of the FixMatch and Noisy Student algorithms. Specifically, we generate the pseudo labels for the unlabeled data onthe-fly with a seq2seq model after perturbing the input features with DA. We also propose soft label variants of both algorithms to cope with pseudo label errors, showing further performance improvements. We conduct SSL experiments on a conversational speech data set (doctor-patient conversations) with 1.9 kh manually transcribed training data, using only 25 % of the original labels (475 h labeled data). In the result, the Noisy Student algorithm with soft labels and consistency regularization achieves 10.4 % word error rate (WER) reduction when adding 475 h of unlabeled data, corresponding to a recovery rate of 92 %. Furthermore, when iteratively adding 950 h more unlabeled data, our best SSL performance is within 5 % WER increase compared to using the full labeled training set (recovery rate: 78 %).
This paper presents a front-end consisting of an Artificial Neural Network (ANN) architecture trained with multilingual corpora. The idea is to train an ANN front-end able to integrate the acoustic variations included in databases collected for different languages, through different channels, or even for specific tasks. This ANN front-end produces discriminant features that can be used as observation vectors for language or task dependent recognizers.The approach has been evaluated on three difficult tasks: recognition of non-native speaker sentences, training of a new language with a limited amount of speech data, and training of a model for car environment using a clean microphone corpus of the target language and data collected in car environment in another language.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.