2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
DOI: 10.1109/asru46091.2019.9003769
|View full text |Cite
|
Sign up to set email alerts
|

Domain Expansion in DNN-Based Acoustic Models for Robust Speech Recognition

Abstract: Training acoustic models with sequentially incoming datawhile both leveraging new data and avoiding the forgetting effect -is an essential obstacle to achieving human intelligence level in speech recognition. An obvious approach to leverage data from a new domain (e.g., new accented speech) is to first generate a comprehensive dataset of all domains, by combining all available data, and then use this dataset to retrain the acoustic models. However, as the amount of training data grows, storing and retraining o… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
1

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(7 citation statements)
references
References 53 publications
(78 reference statements)
0
7
0
Order By: Relevance
“…This can be done through KL-divergence [9] between the adapted model and the baseline model, elastic weight consolidation (EWC), and weight constraint adaptation (WCA). Ghorbani et al [16] inves-tigates several of these methods for one transfer step finding a hybrid of soft-KL-divergence and EWC to be the best, while Houston [17] looks at the performance over multiple transfer steps finding that EWC covers 65% of the gap between a finetuned and a pooled (best-case) model across all accents. An advantage of these regularization techniques is that the original data need not be present in order to adapt and improve the model.…”
Section: Domain Expansionmentioning
confidence: 99%
“…This can be done through KL-divergence [9] between the adapted model and the baseline model, elastic weight consolidation (EWC), and weight constraint adaptation (WCA). Ghorbani et al [16] inves-tigates several of these methods for one transfer step finding a hybrid of soft-KL-divergence and EWC to be the best, while Houston [17] looks at the performance over multiple transfer steps finding that EWC covers 65% of the gap between a finetuned and a pooled (best-case) model across all accents. An advantage of these regularization techniques is that the original data need not be present in order to adapt and improve the model.…”
Section: Domain Expansionmentioning
confidence: 99%
“…Convolutional neural networks are one of many other machine learning strategies adapted for acoustic modeling to handle the long-term dependencies in ASR. Similar to RNNs, CNNs have also shown significant improvements in ASR performance over FC-DNNs [13,14,15,16]. Recent research also shows that the use of residual connections can train deeper CNN architectures in a more efficient way compared to RNNs [17].…”
Section: Related Workmentioning
confidence: 99%
“…Architecture-based methods [6,8] dynamically expand a model architecture for new data. In the field of ASR, various studies [9,10,11] have demonstrated the benefits of the incremental learning approach. Fu et al [12] proposed an incremental learning algorithm for end-to-end ASR that uses attention distillation and knowledge distillation [13] to prevent catastrophic forgetting.…”
Section: Introductionmentioning
confidence: 99%
“…Fu et al [12] proposed an incremental learning algorithm for end-to-end ASR that uses attention distillation and knowledge distillation [13] to prevent catastrophic forgetting. Such meth-ods require retaining the previous model [6,12] or adding extra parameters of the same size as the model for optimization [8,9]. Therefore, adapting recent end-to-end ASR models with a huge amount of parameters is computationally expensive.…”
Section: Introductionmentioning
confidence: 99%