A GPU-based WFST Decoder with Exact Lattice Generation

Chen, Zhehuai; Luitjens, Justin; Xu, Hainan; Wang, Yiming; Povey, Daniel; Khudanpur, Sanjeev

doi:10.21437/interspeech.2018-1339

Cited by 13 publications

(11 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In practice, rapid execution of ASR decoding is essential for better user experience. Reduction of sequence length [6,5,7] and parallel computing [8,9,10] are mainly investigated for rapid computation of likelihoods and efficient traversal of search space.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Vectorized Beam Search for CTC-Attention-Based Speech Recognition

et al. 2019

View full text Add to dashboard Cite

Attention-based encoder decoder network uses a left-to-right beam search algorithm in the inference step. The current beam search expands hypotheses and traverses the expanded hypotheses at the next time step. This traversal is implemented using a for-loop program in general, and it leads to speed down of the recognition process. In this paper, we propose a parallelism technique for beam search, which accelerates the search process by vectorizing multiple hypotheses to eliminate the for-loop program. We also propose a technique to batch multiple speech utterances for off-line recognition use, which reduces the for-loop program with regard to the traverse of multiple utterances. This extension is not trivial during beam search unlike during training due to several pruning and thresholding techniques for efficient decoding. In addition, our method can combine scores of external modules, RNNLM and CTC, in a batch as shallow fusion. We achieved 3.7× speedup compared with the original beam search algorithm by vectoring hypotheses, and achieved 10.5× speedup by further changing processing unit to GPU.

show abstract

Section: Introductionmentioning

confidence: 99%

“…al., [9] and Chen, et. al., [10] further extended the search algorithm by executing graph traversal on GPU. These studies focused on efficient computation of WFST (Weighted Finite-State Transducer) based decoding.…”

Section: Introductionmentioning

confidence: 99%

Vectorized Beam Search for CTC-Attention-Based Speech Recognition

et al. 2019

View full text Add to dashboard Cite

show abstract

“…The proposed work is most closely related to and improves upon the first fully GPU-accelerated lattice decoder [20], which maps token passing constructs [13] to GPU. Starting from the single-threaded CPU decoder, we tailored the algorithm to the strengths of the hardware, including avoiding unnecessary synchronization and atomics, and using flat, compact memory structures.…”

Section: Related Workmentioning

confidence: 99%

“…Across the tested configurations, the GPU decoder outperforms the multithreaded CPU implementation within Kaldi, with a relative speedup ranging between 14x and 18x when compared to a full 20-core Xeon processor. When compared with the current state-of-the art parallel decoder [20], the proposed algorithm decodes between 11x and 41x faster. Table 2.…”

Section: Speed Improvementsmentioning

confidence: 99%

Gpu-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition

Luitjens

Leary

Kaldewey

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

We present an optimized weighted finite-state transducer (WFST) decoder capable of online streaming and offline batch processing of audio using Graphics Processing Units (GPUs). The decoder is efficient in memory utilization, input/output bandwidth, and uses a novel Viterbi implementation designed to maximize parallelism. Memory savings enable the decoder to process larger graphs than previously possible while simultaneously supporting larger numbers of consecutive streams. GPU preprocessing of lattice segments enable intermediate lattice results to be returned to the requestor during streaming inference. Collectively, the proposed improvements achieve up to a 240x speedup over single core CPU decoding, and up to 40x faster decoding than the current state-of-the-art GPU decoder, while returning equivalent results. This architecture also makes deployment of production-grade models on hardware ranging from large data center servers to low-power edge devices practical.

show abstract

“…In the inference stage of the E2E speech recognition, prior work such as [20,21,22,23,10,24] uses n-gram LM or NNLM to bias search 2 We also force hypotheses to end in the end of WFST. Fig.…”

Section: Relation To Prior Workmentioning

confidence: 99%

End-to-end Contextual Speech Recognition Using Class Language Models and a Token Passing Decoder

Chen

Jain

Wang

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

End-to-end modeling (E2E) of automatic speech recognition (ASR) blends all the components of a traditional speech recognition system into a unified model. Although it simplifies training and decoding pipelines, the unified model is hard to adapt when mismatch exists between training and test data. In this work, we focus on contextual speech recognition, which is particularly challenging for E2E models because it introduces significant mismatch between training and test data. To improve the performance in the presence of complex contextual information, we propose to use class-based language models(CLM) that can populate the classes with contextdependent information in real-time. To enable this approach to scale to a large number of class members and minimize search errors, we propose a token passing decoder with efficient token recombination for E2E systems for the first time. We evaluate the proposed system on general and contextual ASR, and achieve relative 62% Word Error Rate(WER) reduction for contextual ASR without hurting performance for general ASR. We show that the proposed method performs well without modification of the decoding hyper-parameters across tasks, making it a general solution for E2E ASR.

show abstract

A GPU-based WFST Decoder with Exact Lattice Generation

Cited by 13 publications

References 31 publications

Vectorized Beam Search for CTC-Attention-Based Speech Recognition

Vectorized Beam Search for CTC-Attention-Based Speech Recognition

Gpu-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition

End-to-end Contextual Speech Recognition Using Class Language Models and a Token Passing Decoder

Contact Info

Product

Resources

About