Interspeech 2015 2015
DOI: 10.21437/interspeech.2015-475
|View full text |Cite
|
Sign up to set email alerts
|

Applying GPGPU to recurrent neural network language model based fast network search in the real-time LVCSR

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 0 publications
0
7
0
Order By: Relevance
“…Several attempts have been made to utilize RNNLMs for online decoding in real-time ASR systems [5,6,7] However, they either simulate only some aspects of RNNLMs into the traditional architectures [5,6], or perform a 2-pass decoding [7] which innately could not be applied before the end of the utterance was reached. There have also been attempts to apply RNNLM directly to online ASR without approximation by eliminating redundant computations [8,9,10]. In our previous research [9], we were successful in applying moderate size RNNLMs directly to CPU-GPU hybrid online ASR systems with a cache strategy [10].…”
Section: Introductionmentioning
confidence: 91%
See 2 more Smart Citations
“…Several attempts have been made to utilize RNNLMs for online decoding in real-time ASR systems [5,6,7] However, they either simulate only some aspects of RNNLMs into the traditional architectures [5,6], or perform a 2-pass decoding [7] which innately could not be applied before the end of the utterance was reached. There have also been attempts to apply RNNLM directly to online ASR without approximation by eliminating redundant computations [8,9,10]. In our previous research [9], we were successful in applying moderate size RNNLMs directly to CPU-GPU hybrid online ASR systems with a cache strategy [10].…”
Section: Introductionmentioning
confidence: 91%
“…However, in order to speed up on-the-fly rescoring based on RNNLMs, we needed to reduce redundant computations as much as possible. In this section, we briefly outline the architecture of our baseline CPU-GPU hybrid RNNLM rescoring proposed in [9]. The main highlights of our baseline architecture are the use of gated recurrent unit (GRU) [14] based RNNLM, noise contrastive estimation (NCE) [15] at the output layer, n-gram based maximum entropy (MaxEnt) bypass [16] from input to output layers, and cache based on-the-fly rescoring.…”
Section: Architecture Of Our Baseline Cpu-gpu Hybrid Rnnlm Rescoringmentioning
confidence: 99%
See 1 more Smart Citation
“…In [22], an on-the-fly rescoring approach to integrate LSTM-LMs into 1-st pass decoding is presented. The authors of [23] use a hybrid CPU/GPGPU architecture for real time decoding. The HCL transducer is composed with a small n-gram model and is expanded on the GPU while rescoring with an LSTM LM happens on CPU.…”
Section: Related Workmentioning
confidence: 99%
“…All three papers use hierarchical softmax / word classes to reduce the number of computations in the output layer [24] and with the exception of [21] interpolate the LSTM-LM with a Max-Entropy LM [25]. The works of [23] are extended in [26]. The LSTM Units are replaced with GRUs, NCE replaces the hierarchical softmax and GRU states are quantized to reduce the number of necessary computations.…”
Section: Related Workmentioning
confidence: 99%