This study investigated large-scale semi-supervised training (SST) to improve acoustic models for automatic speech recognition. The conventional self-training, the recently proposed committee-based SST using heterogeneous neural networks and the lattice-based SST were examined and compared. The large-scale SST was studied in deep neural network acoustic modeling with respect to the automatic transcription quality, the importance data filtering, the training data quantity and other data attributes of a large quantity of multi-genre unsupervised live data. We found that the SST behavior on large-scale ASR tasks was very different from the behavior obtained on small-scale SST: 1) big data can tolerate a certain degree of mislabeling in the automatic transcription for SST. It is possible to achieve further performance gains with more unsupervised fresh data, and even the automatic transcriptions have a certain degree of errors; 2) the audio attributes, transcription quality and importance of the fresh data are more important than the increased data quantity for large-scale SST; and 3) there are large differences in performance gains on different recognition tasks, such that the benefits highly depend on the selected data attributes of unsupervised data and the data scale of the baseline ASR system. Furthermore, we proposed a novel utterance filtering approach based on active learning to improve the data selection in large-scale SST. The experimental results showed that the SST with the proposed data filtering yields a 2-11% relative word error rate reduction on five multi-genre recognition tasks, even with the baseline acoustic model that was already well trained on a 10000-hr supervised dataset.INDEX TERMS Semi-supervised learning, data preprocessing, acoustic modeling, speech recognition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.