Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models

Swietojanski, Paweł; Renals, Steve

doi:10.1109/slt.2014.7078569

Cited by 217 publications

(204 citation statements)

References 34 publications

Supporting

Mentioning

202

Contrasting

Order By: Relevance

“…In most corpora, the training speakers differ from the test speakers. This is widely recognized as good practice and many solutions are available to improve robustness to this mismatch (Gales, 1998;Shinoda, 2011;Karafiát et al, 2011;Swietojanski and Renals, 2014). By contrast, the acoustic conditions of the training data often match (or cover) those of the test data.…”

Section: Introductionmentioning

confidence: 99%

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

Vincent

Watanabe

Nugraha

et al. 2017

Computer Speech & Language

302

177

View full text Add to dashboard Cite

Section: Introductionmentioning

confidence: 99%

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

Vincent

Watanabe

Nugraha

et al. 2017

Computer Speech & Language

302

177

View full text Add to dashboard Cite

“…Learning hidden unit contribution (LHUC) is a method that linearly re-combines hidden units in a speaker-or environmentdependent manner [14,25]. Given adaptation data, LHUC rescales the contributions (amplitudes) of the hidden units in the model without actually modifying their feature receptors.…”

Section: Learning Hidden Unit Contributionmentioning

confidence: 99%

“…Similar approaches have been proposed independently in [12] and [13]. Researchers have also introduced learning hidden unit contribution (LHUC) to weight hidden unit activations in a speaker-or environment-dependent manner [14]. It was shown that LHUC results in consistent WER reductions for speaker and environment adaptation [15].…”

Section: Introductionmentioning

confidence: 99%

An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation

Tong¹,

Garner²,

Bourlard³

2017

Interspeech 2017

View full text Add to dashboard Cite

Different training and adaptation techniques for multilingual Automatic Speech Recognition (ASR) are explored in the context of hybrid systems, exploiting Deep Neural Networks (DNN) and Hidden Markov Models (HMM). In multilingual DNN training, the hidden layers (possibly extracting bottleneck features) are usually shared across languages, and the output layer can either model multiple sets of language-specific senones or one single universal IPA-based multilingual senone set. Both architectures are investigated, exploiting and comparing different language adaptive training (LAT) techniques originating from successful DNN-based speaker-adaptation. More specifically, speaker adaptive training methods such as Cluster Adaptive Training (CAT) and Learning Hidden Unit Contribution (LHUC) are considered. In addition, a language adaptive output architecture for IPA-based universal DNN is also studied and tested.Experiments show that LAT improves the performance and adaptation on the top layer further improves the accuracy. By combining state-level minimum Bayes risk (sMBR) sequence training with LAT, we show that a language adaptively trained IPA-based universal DNN outperforms a monolingually sequence trained model.

show abstract

“…In this paper, a deep multilayer neural network is used for the object recognition [12][13][14]. Its learning rule is to apply the steepest descend method to adjust the weights and thresholds of the neural network according to the minimum sum of the square error.…”

Section: Object Recognitionmentioning

confidence: 99%

An assembly system based on industrial robot with binocular stereo vision

Tang

2016

Proceedings of the 2016 International Conference on Advanced Electronic Science and Technology (AEST 2016)

View full text Add to dashboard Cite

Abstract. This paper proposes an electronic part and component assembly system based on an industrial robot with binocular stereo vision. Firstly, binocular stereo vision with a visual attention mechanism model is used to get quickly the image regions which contain the electronic parts and components. Secondly, a deep neural network is adopted to recognize the features of the electronic parts and components. Thirdly, in order to control the end-effector of the industrial robot to grasp the electronic parts and components, a genetic algorithm (GA) is proposed to compute the transition matrix and the inverse kinematics of the industrial robot (end-effector), which plays a key role in bridging the binocular stereo vision and the industrial robot. Finally, the proposed assembly system is tested in LED component assembly experiments, and the results denote that it has high efficiency and good applicability.

show abstract

Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models

Cited by 217 publications

References 34 publications

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

An analysis of environment, microphone and data simulation mismatches in robust speech recognition

An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation

An assembly system based on industrial robot with binocular stereo vision

Contact Info

Product

Resources

About