Weighted finite-state transducer (WFST) decoding in speech recognition can be accelerated by using graphics processing units (GPUs). To obtain a high recognition accuracy in a WFSTbased speech recognition system, a very large language model (LM) represented as a WFST with more than 10 GB of data is required. Since a GPU typically has only several GB of memory, it is impossible to store such a large LM in GPU memory. In this paper, we propose a new method for WFST decoding on a GPU. The method utilizes the on-the-fly rescoring algorithm, which performs the Viterbi search on a WFST with a small LM and rescores hypotheses using a large LM during decoding. We solve the problem of insufficient GPU memory by storing most of the large LM in a memory on the host and copying the data from the host memory to the GPU memory as needed during runtime. Our evaluation of the proposed method on the Lib-riSpeech test sets using an NVIDIA Tesla V100 GPU shows that it achieves a ten times faster decoding than an equivalent CPU implementation without recognition accuracy degradation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.