Decoding with Finite-State Transducers on GPUs

Argueta, Arturo; Chiang, David

doi:10.18653/v1/e17-1098

Cited by 8 publications

(7 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If tokens reach the same state in the same frame, only the token with the smallest weight remains. This mechanism is called token recombina- [17,20,21], the atomicMin 1 operation is applied to an array whose size is the number of states or arcs for the token recombination. Using this procedure, the array consumes a large amount of GPU memory when a WFST has a large number of states and arcs.…”

Section: Parallel Viterbi Search On Gpumentioning

confidence: 99%

GPU-Based WFST Decoding with Extra Large Language Model

Fukunaga¹,

Tanaka

Kageyama

2019

Interspeech 2019

View full text Add to dashboard Cite

Weighted finite-state transducer (WFST) decoding in speech recognition can be accelerated by using graphics processing units (GPUs). To obtain a high recognition accuracy in a WFSTbased speech recognition system, a very large language model (LM) represented as a WFST with more than 10 GB of data is required. Since a GPU typically has only several GB of memory, it is impossible to store such a large LM in GPU memory. In this paper, we propose a new method for WFST decoding on a GPU. The method utilizes the on-the-fly rescoring algorithm, which performs the Viterbi search on a WFST with a small LM and rescores hypotheses using a large LM during decoding. We solve the problem of insufficient GPU memory by storing most of the large LM in a memory on the host and copying the data from the host memory to the GPU memory as needed during runtime. Our evaluation of the proposed method on the Lib-riSpeech test sets using an NVIDIA Tesla V100 GPU shows that it achieves a ten times faster decoding than an equivalent CPU implementation without recognition accuracy degradation.

show abstract

Section: Parallel Viterbi Search On Gpumentioning

confidence: 99%

GPU-Based WFST Decoding with Extra Large Language Model

Fukunaga¹,

Tanaka

Kageyama

2019

Interspeech 2019

View full text Add to dashboard Cite

show abstract

“…Our GPU implementation stores FST transition functions in a format similar to compressed sparse row (CSR) format, as introduced by our previous work Argueta and Chiang (2017). For the composition task we use a slightly different representation.…”

Section: Transducer Representationmentioning

confidence: 99%

“…In our previous work (Argueta and Chiang, 2017), we created transducers for a toy translation task. We trained a bigram language model (as in Figure 3a) and a one-state translation model (as in Figure 3 Figure 3: The transducers used for testing were obtained by pre-composing: (a) a language model and (b) a translation model.…”

Section: Setupmentioning

confidence: 99%

“…In our previous work (Argueta and Chiang, 2017), we created transducers for a toy translation task. We trained a bigram language model (as in Figure 3a) and a one-state translation model (as in Figure 3) with probabilities estimated from 1 https://thrust.github.io/ 0 1 2 3 l a / 0 .8 u n a / 0 .2 g a t a / 1 .0 g a t a / 1 .0 l a :t h e /0 .6 u n a :t h e /0 .4…”

Section: Setupmentioning

confidence: 99%

See 1 more Smart Citation

Composing Finite State Transducers on GPUs

Argueta¹,

Chiang

2018

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Self Cite

View full text Add to dashboard Cite

Weighted finite state transducers (FSTs) are frequently used in language processing to handle tasks such as part-of-speech tagging and speech recognition. There has been previous work using multiple CPU cores to accelerate finite state algorithms, but limited attention has been given to parallel graphics processing unit (GPU) implementations. In this paper, we introduce the first (to our knowledge) GPU implementation of the FST composition operation, and we also discuss the optimizations used to achieve the best performance on this architecture. We show that our approach obtains speedups of up to 6× over our serial implementation and 4.5× over OpenFST.

show abstract

“…Moreover, it also requires less data transfer compared with [22]. Finally, to the best of our understanding, [24] is the only open-source project in this field, but it only implemented the basic Viterbi decoding without combining AM posteriors and beam [9], which cannot be applied to ASR.…”

Section: Introductionmentioning

confidence: 99%

A GPU-based WFST Decoder with Exact Lattice Generation

Chen

Luitjens

et al. 2018

Interspeech 2018

View full text Add to dashboard Cite

We describe initial work on an extension of the Kaldi toolkit that supports weighted finite-state transducer (WFST) decoding on Graphics Processing Units (GPUs). We implement token recombination as an atomic GPU operation in order to fully parallelize the Viterbi beam search, and propose a dynamic load balancing strategy for more efficient token passing scheduling among GPU threads. We also redesign the exact lattice generation and lattice pruning algorithms for better utilization of the GPUs. Experiments on the Switchboard corpus show that the proposed method achieves identical 1-best results and lattice quality in recognition and confidence measure tasks, while running 3 to 15 times faster than the single process Kaldi decoder. The above results are reported on different GPU architectures. Additionally we obtain a 46-fold speedup with sequence parallelism and multi-process service (MPS) in GPU.

show abstract

Decoding with Finite-State Transducers on GPUs

Cited by 8 publications

References 21 publications

GPU-Based WFST Decoding with Extra Large Language Model

GPU-Based WFST Decoding with Extra Large Language Model

Composing Finite State Transducers on GPUs

A GPU-based WFST Decoder with Exact Lattice Generation

Contact Info

Product

Resources

About