Conditional Teacher-student Learning

Meng, Ziyang; Li, Jinyu; Zhao, Yong; Gong, Yifan

doi:10.1109/icassp.2019.8683438

Cited by 76 publications

(58 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Compared to the one-hot labels, the soft posteriors accurately models the inherent statistical relationships among different token classes in addition to the token identity encoded by a one-hot vector. It proves to be a more powerful target for the student to learn from which is consistent with what was observed in [18,19,20,21,22].…”

Section: Unsupervised Domain Adaptation With T/s Learningsupporting

confidence: 83%

“…To address this issue, conditional T/S learning (CT/S) [21] was proposed recently in which the student selectively chooses to learn from either the teacher AED or the ground truth conditioned on whether the teacher AED can correctly predict the ground-truth labels. CT/S have shown significant WER improvements over T/S and IT/S for both domain and speaker adaptation on CHiME-3 dataset.…”

Section: Adaptive T/s (At/s) Learning For Supervised Domain Adaptatiomentioning

confidence: 99%

“…In T/S learning, the Kullback-Leibler (KL) divergence between the output senone distributions of teacher and student acoustic models given parallel source and target domain data at the input is minimized by updating only the student model parameters. T/S training was shown to outperform the cross entropy training directly using the hard label in the target domain [18,19,20,21,22].…”

Section: Introductionmentioning

confidence: 99%

“…However, the optimal weights are data-dependent and can only be determined through careful tuning on a dev set. More recently, conditional T/S (CT/S) learning was proposed in [21] where the student model selectively chooses to learn from either the teacher or the ground truth depending on whether the teacher's prediction is correct or not. CT/S does not disturb the statistical relationships among classes naturally embedded in the class posteriors and achieves significant word error rate (WER) improvement over T/S for domain adaptation on CHiME-3 dataset [24].…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition

Meng

Gaur

et al. 2019

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

Self Cite

View full text Add to dashboard Cite

Teacher-student (T/S) has shown to be effective for domain adaptation of deep neural network acoustic models in hybrid speech recognition systems. In this work, we extend the T/S learning to large-scale unsupervised domain adaptation of an attention-based end-to-end (E2E) model through two levels of knowledge transfer: teacher's token posteriors as soft labels and one-best predictions as decoder guidance. To further improve T/S learning with the help of ground-truth labels, we propose adaptive T/S (AT/S) learning. Instead of conditionally choosing from either the teacher's soft token posteriors or the one-hot ground-truth label, in AT/S, the student always learns from both the teacher and the ground truth with a pair of adaptive weights assigned to the soft and one-hot labels quantifying the confidence on each of the knowledge sources. The confidence scores are dynamically estimated at each decoder step as a function of the soft and one-hot labels. With 3400 hours parallel close-talk and far-field Microsoft Cortana data for domain adaptation, T/S and AT/S achieve 6.3% and 10.3% relative word error rate improvement over a strong E2E model trained with the same amount of far-field data.

show abstract

Section: Unsupervised Domain Adaptation With T/s Learningsupporting

confidence: 83%

Section: Adaptive T/s (At/s) Learning For Supervised Domain Adaptatiomentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition

Meng

Gaur

et al. 2019

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

Self Cite

View full text Add to dashboard Cite

show abstract

“…As a result of their experiments they observed a significant improvement in accuracy. Meng Z. et al (2019) used a "smart" Teacher-Student model for domain adaption and speaker adaption in automatic speech recognition. Their model selectively chooses to learn from either the teacher model or the gold standard labels conditioned on whether the teacher can correctly predict the gold standard.…”

Section: Related Workmentioning

confidence: 99%

Relevance-Based Data Masking: A Model-Agnostic Transfer Learning Approach for Facial Expression Recognition

Schiller

Huber

Dietz

et al. 2020

Front. Comput. Sci.

View full text Add to dashboard Cite

Deep learning approaches are now a popular choice in the field of automatic emotion recognition (AER) across various modalities. Due to the high costs of manually labeling human emotions however, the amount of available training data is relatively scarce in comparison to other tasks. To facilitate the learning process and reduce the necessary amount of training-data, modern approaches therefore often rely on leveraging knowledge from models that have already been trained on related tasks where data is available abundantly. In this work we introduce a novel approach to transfer learning, which addresses two shortcomings of traditional methods: The (partial) inheritance of the original models structure and the restriction to other neural network models as an input source. To this end we identify the parts in the input that have been relevant for the decision of the model we want to transfer knowledge from, and directly encode those relevant regions in the data on which we train our new model. To validate our approach we performed experiments on well-established datasets for the task of automatic facial expression recognition. The results of those experiments are suggesting that our approach helps to accelerate the learning process.

show abstract