This paper describes the development of the automatic speech recognition (ASR) system for the submission to the VOiCES from a Distance Challenge 2019. In this challenge, we focused on the fixed condition, where the task is to recognize reverberant and noisy speech based on a limited amount of clean training data. In our system, the mismatch between the training and testing conditions was reduced by using multi-style training where the training data was artificially contaminated with different reverberation and noise sources. Also, the Weighted Prediction Error (WPE) algorithm was used to reduce the reverberant effect in the evaluation data. To boost the system performance, acoustic models of different neural network architectures were trained and the respective systems were fused to give the final output. Moreover, an LSTM language model was used to rescore the lattice to compensate the weak n-gram model trained from only the transcription text. Evaluated on the development set, our system showed an average word error rate (WER) of 27.04%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.