Applying GPGPU to recurrent neural network language model based fast network search in the real-time LVCSR

Lee, Kyung-Min; Park, Chiyoun; Kim, Il-Hwan; Kim, Nam Hoon; Lee, Jaewon

doi:10.21437/interspeech.2015-475

Cited by 5 publications

(7 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several attempts have been made to utilize RNNLMs for online decoding in real-time ASR systems [5,6,7] However, they either simulate only some aspects of RNNLMs into the traditional architectures [5,6], or perform a 2-pass decoding [7] which innately could not be applied before the end of the utterance was reached. There have also been attempts to apply RNNLM directly to online ASR without approximation by eliminating redundant computations [8,9,10]. In our previous research [9], we were successful in applying moderate size RNNLMs directly to CPU-GPU hybrid online ASR systems with a cache strategy [10].…”

Section: Introductionmentioning

confidence: 91%

“…However, in order to speed up on-the-fly rescoring based on RNNLMs, we needed to reduce redundant computations as much as possible. In this section, we briefly outline the architecture of our baseline CPU-GPU hybrid RNNLM rescoring proposed in [9]. The main highlights of our baseline architecture are the use of gated recurrent unit (GRU) [14] based RNNLM, noise contrastive estimation (NCE) [15] at the output layer, n-gram based maximum entropy (MaxEnt) bypass [16] from input to output layers, and cache based on-the-fly rescoring.…”

Section: Architecture Of Our Baseline Cpu-gpu Hybrid Rnnlm Rescoringmentioning

confidence: 99%

“…There have also been attempts to apply RNNLM directly to online ASR without approximation by eliminating redundant computations [8,9,10]. In our previous research [9], we were successful in applying moderate size RNNLMs directly to CPU-GPU hybrid online ASR systems with a cache strategy [10]. However, in order to apply it to a more complex task with bigger RNNLMs, we needed to find a way to accelerate it further.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Accelerating Recurrent Neural Network Language Model Based Online Speech Recognition System

Lee

Park

Kim

et al. 2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

This paper presents methods to accelerate recurrent neural network based language models (RNNLMs) for online speech recognition systems. Firstly, a lossy compression of the past hidden layer outputs (history vector) with caching is introduced in order to reduce the number of LM queries. Next, RNNLM computations are deployed in a CPU-GPU hybrid manner, which computes each layer of the model on a more advantageous platform. The added overhead by data exchanges between CPU and GPU is compensated through a frame-wise batching strategy. The performance of the proposed methods evaluated on LibriSpeech 1 test sets indicates that the reduction in history vector precision improves the average recognition speed by 1.23 times with minimum degradation in accuracy. On the other hand, the CPU-GPU hybrid parallelization enables RNNLM based real-time recognition with a four times improvement in speed.

show abstract

Section: Introductionmentioning

confidence: 91%

Section: Architecture Of Our Baseline Cpu-gpu Hybrid Rnnlm Rescoringmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Accelerating Recurrent Neural Network Language Model Based Online Speech Recognition System

Lee

Park

Kim

et al. 2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…In [22], an on-the-fly rescoring approach to integrate LSTM-LMs into 1-st pass decoding is presented. The authors of [23] use a hybrid CPU/GPGPU architecture for real time decoding. The HCL transducer is composed with a small n-gram model and is expanded on the GPU while rescoring with an LSTM LM happens on CPU.…”

Section: Related Workmentioning

confidence: 99%

“…All three papers use hierarchical softmax / word classes to reduce the number of computations in the output layer [24] and with the exception of [21] interpolate the LSTM-LM with a Max-Entropy LM [25]. The works of [23] are extended in [26]. The LSTM Units are replaced with GRUs, NCE replaces the hierarchical softmax and GRU states are quantized to reduce the number of necessary computations.…”

Section: Related Workmentioning

confidence: 99%

LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring

Beck¹,

Zhou²,

Schlüter³

et al. 2019

Preprint

View full text Add to dashboard Cite

LSTM based language models are an important part of modern LVCSR systems as they significantly improve performance over traditional backoff language models. Incorporating them efficiently into decoding has been notoriously difficult. In this paper we present an approach based on a combination of onepass decoding and lattice rescoring. We perform decoding with the LSTM-LM in the first pass but recombine hypothesis that share the last two words, afterwards we rescore the resulting lattice. We run our systems on GPGPU equipped machines and are able to produce competitive results on the Hub5'00 and Librispeech evaluation corpora with a runtime better than real-time. In addition we shortly investigate the possibility to carry out the full sum over all state-sequences belonging to a given wordhypothesis during decoding without recombination.

show abstract

Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models

Jorge

Giménez

Silvestre-Cerdà

et al. 2022

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Although Long-Short Term Memory (LSTM) networks and deep Transformers are now extensively used in offline ASR, it is unclear how best offline systems can be adapted to work with them under the streaming setup. After gaining considerable experience on this regard in recent years, in this paper we show how an optimized, low-latency streaming decoder can be built in which bidirectional LSTM acoustic models, together with general interpolated language models, can be nicely integrated with minimal perfomance degradation. In brief, our streaming decoder consists of a one-pass, real-time search engine relying on a limited-duration window sliding over time and a number of ad hoc acoustic and language model pruning techniques. Extensive empirical assessment is provided on truly streaming tasks derived from the well-known LibriSpeech and TED talks datasets, as well as from TV shows on a main Spanish broadcasting station.

show abstract

Applying GPGPU to recurrent neural network language model based fast network search in the real-time LVCSR

Cited by 5 publications

References 0 publications

Accelerating Recurrent Neural Network Language Model Based Online Speech Recognition System

Accelerating Recurrent Neural Network Language Model Based Online Speech Recognition System

LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring

Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models

Contact Info

Product

Resources

About