End-to-End Amdo-Tibetan Speech Recognition Based on Knowledge Transfer

Zhu, Xiaojun; Huang, Heming

doi:10.1109/access.2020.3023783

Cited by 8 publications

(2 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To that end, the character error rate (CER) has been used instead of WER, although the evaluation principle remains the same. Besides, WER is also called phoneme error rate (PER) in schemes that adopt phoneme as a unit of measure rather than a word [53][54][55]. The word recognition rate (WRR) is a version of WER that may be used to assess ASR performance such that WRR = 1 − WER and N − (S + D) is the total number of successfully predicted words [25].…”

Section: Evaluation Criteria In Asrmentioning

confidence: 99%

Video surveillance using deep transfer learning and deep domain adaptation: Towards better generalization

Himeur

Al‐Maadeed

Kheddar

et al. 2023

Engineering Applications of Artificial Intelligence

View full text Add to dashboard Cite

Section: Evaluation Criteria In Asrmentioning

confidence: 99%

Video surveillance using deep transfer learning and deep domain adaptation: Towards better generalization

Himeur

Al‐Maadeed

Kheddar

et al. 2023

Engineering Applications of Artificial Intelligence

View full text Add to dashboard Cite

“…In practical application scenarios, speech signals will inevitably be disturbed by many interference factors such as noise, echo, and reverberation. Therefore, speech enhancement technology has been widely used in household appliances, communications, speech recognition, automotive electronics, hearing aids, and other fields [1][2][3]. Traditional speech enhancement methods, based on signal processing and statistical modeling, have good performance for stationary noise.…”

Section: Introductionmentioning

confidence: 99%

Using Hybrid Penalty and Gated Linear Units to Improve Wasserstein Generative Adversarial Networks for Single-Channel Speech Enhancement

Zhu¹,

Huang²

2023

Computer Modeling in Engineering &Amp; Sciences

View full text Add to dashboard Cite

Recently, speech enhancement methods based on Generative Adversarial Networks have achieved good performance in time-domain noisy signals. However, the training of Generative Adversarial Networks has such problems as convergence difficulty, model collapse, etc. In this work, an end-to-end speech enhancement model based on Wasserstein Generative Adversarial Networks is proposed, and some improvements have been made in order to get faster convergence speed and better generated speech quality. Specifically, in the generator coding part, each convolution layer adopts different convolution kernel sizes to conduct convolution operations for obtaining speech coding information from multiple scales; a gated linear unit is introduced to alleviate the vanishing gradient problem with the increase of network depth; the gradient penalty of the discriminator is replaced with spectral normalization to accelerate the convergence rate of the model; a hybrid penalty term composed of L1 regularization and a scale-invariant signal-to-distortion ratio is introduced into the loss function of the generator to improve the quality of generated speech. The experimental results on both TIMIT corpus and Tibetan corpus show that the proposed model improves the speech quality significantly and accelerates the convergence speed of the model.

show abstract

Deep transfer learning for automatic speech recognition: Towards better generalization

Kheddar

Himeur

Al-Maadeed

et al. 2023

Knowledge-Based Systems

View full text Add to dashboard Cite

End-to-End Amdo-Tibetan Speech Recognition Based on Knowledge Transfer

Cited by 8 publications

References 39 publications

Video surveillance using deep transfer learning and deep domain adaptation: Towards better generalization

Video surveillance using deep transfer learning and deep domain adaptation: Towards better generalization

Using Hybrid Penalty and Gated Linear Units to Improve Wasserstein Generative Adversarial Networks for Single-Channel Speech Enhancement

Deep transfer learning for automatic speech recognition: Towards better generalization

Contact Info

Product

Resources

About