ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020
DOI: 10.1109/icassp40776.2020.9054099
|View full text |Cite
|
Sign up to set email alerts
|

Gpu-Accelerated Viterbi Exact Lattice Decoder for Batched Online and Offline Speech Recognition

Abstract: We present an optimized weighted finite-state transducer (WFST) decoder capable of online streaming and offline batch processing of audio using Graphics Processing Units (GPUs). The decoder is efficient in memory utilization, input/output bandwidth, and uses a novel Viterbi implementation designed to maximize parallelism. Memory savings enable the decoder to process larger graphs than previously possible while simultaneously supporting larger numbers of consecutive streams. GPU preprocessing of lattice segment… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 20 publications
0
6
0
Order By: Relevance
“…To increase the throughput of RNN-based speech recognizers, Amodei et al, Braun et al, and Oh et al [33][34][35] used several batch processing approaches. In particular, Braun et al and Oh et al [34,35] aimed to accelerate GPU parallelization.…”
Section: Literature Reviewmentioning
confidence: 99%
See 1 more Smart Citation
“…To increase the throughput of RNN-based speech recognizers, Amodei et al, Braun et al, and Oh et al [33][34][35] used several batch processing approaches. In particular, Braun et al and Oh et al [34,35] aimed to accelerate GPU parallelization.…”
Section: Literature Reviewmentioning
confidence: 99%
“…To increase the throughput of RNN-based speech recognizers, Amodei et al, Braun et al, and Oh et al [33][34][35] used several batch processing approaches. In particular, Braun et al and Oh et al [34,35] aimed to accelerate GPU parallelization. Seki et al [36,37] proposed a multiple-utterance multiple-hypothesis vectorized beam search in CTC-attention-based end-to-end speech recognition using a VGG-RNN-based encoder-decoder and showed the decoding throughput increased using a GPU.…”
Section: Literature Reviewmentioning
confidence: 99%
“…• Transcribing the audio file with a pre-trained deep neural network acoustic model and n-gram language model via Kaldi's GPU-based decoder [16]. The output "hypothesis" transcript contains word-level timestamps.…”
Section: Forced Alignmentmentioning
confidence: 99%
“…Prior work has shown that once the acoustic model is accelerated on a GPU, roughly 90% of the run time will be spent in external language model decoding on the CPU [16]. Therefore, we were concerned that simply accelerating the acoustic model on a GPU would not give us meaningful overall speed up.…”
Section: System Implementationmentioning
confidence: 99%
See 1 more Smart Citation