2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016
DOI: 10.1109/icassp.2016.7472084
|View full text |Cite
|
Sign up to set email alerts
|

Simplifying long short-term memory acoustic models for fast training and decoding

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
53
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
4
1

Relationship

3
6

Authors

Journals

citations
Cited by 83 publications
(54 citation statements)
references
References 10 publications
1
53
0
Order By: Relevance
“…The target senone label is delayed by 50 ms, similarly to [16]. We applied frame skipping by a factor of 2 [33] to reduce the runtime cost, which corresponds to 20 ms per frame. Runtime decoding is performed using a 5-gram language model with around 100 M ngrams.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…The target senone label is delayed by 50 ms, similarly to [16]. We applied frame skipping by a factor of 2 [33] to reduce the runtime cost, which corresponds to 20 ms per frame. Runtime decoding is performed using a 5-gram language model with around 100 M ngrams.…”
Section: Methodsmentioning
confidence: 99%
“…cltLSTM-12 further reduces the WER by 3.3% relative over cltLSTM-6, while cltLSTM-24 does not yield further gains. The frame skipping [33] used during runtime means that every frame spans 20 ms. Therefore, ltLSTM, cltLSTM-6, cltLSTM-12, and cltLSTM-24 respectively have 0, 120, 240, and 480 ms greater latencies than an LSTM.…”
Section: Models With Different Look-ahead Framesmentioning
confidence: 99%
“…In experiments shown in Table 1, the frame rate is reduced at the input feature level, which is more similar to the setup from SC-based systems [60,61]. By sampling one in every three frames, the performance of the model improves despite two-thirds of training data not being used.…”
Section: Improvements On Multi-task Trained Baseline Systemsmentioning
confidence: 99%
“…The input feature is 80-dimension log Mel filter bank. We applied frame skipping [7] to reduce the runtime cost. Note that in this study, we only compare the baseline full-rank cross-entropy models.…”
Section: Methodsmentioning
confidence: 99%