2017
DOI: 10.48550/arxiv.1703.07090
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep LSTM for Large Vocabulary Continuous Speech Recognition

Abstract: Recurrent neural networks (RNNs), especially long shortterm memory (LSTM) RNNs, are effective network for sequential task like speech recognition. Deeper LSTM models perform well on large vocabulary continuous speech recognition, because of their impressive learning ability. However, it is more difficult to train a deeper network. We introduce a training framework with layer-wise training and exponential moving average methods for deeper LSTM models. It is a competitive framework that LSTM models of more than … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 16 publications
0
6
0
Order By: Relevance
“…Non-pruning methods. In addition to pruning, other approaches also make significant contribution to LSTMs compression, including distillation (Tian et al, 2017), matrix factorization (Kuchaiev & Ginsburg, 2017), parameter sharing (Lu et al, 2016), group Lasso regularization (Wen et al, 2017), weight quantization (Zen et al, 2016), etc.…”
Section: Lstm Compressionmentioning
confidence: 99%
See 1 more Smart Citation
“…Non-pruning methods. In addition to pruning, other approaches also make significant contribution to LSTMs compression, including distillation (Tian et al, 2017), matrix factorization (Kuchaiev & Ginsburg, 2017), parameter sharing (Lu et al, 2016), group Lasso regularization (Wen et al, 2017), weight quantization (Zen et al, 2016), etc.…”
Section: Lstm Compressionmentioning
confidence: 99%
“…Different from other neural networks, LSTMs are relatively more challenging to be compressed due to the complicated architecture that the information gained from one cell will be shared across all the time steps (Wen et al, 2017). Despite this challenge, researchers already proposed many effective methods to address this problem, including Sparse Variational Dropout (Sparse VD) (Lobacheva et al, 2017), sparse regularization (Wen et al, 2017), distillation (Tian et al, 2017), low-rank factorizations and parameter sharing (Lu et al, 2016) and pruning (Han et al, 2017;Narang et al, 2017;Lee et al, 2018), etc. All of them can achieve promising compression rates with negligible performance loss.…”
Section: Introductionmentioning
confidence: 99%
“…Several methods have been proposed in the literature to approach online ASR with RNNs. A popular choice is to feed the RNN with a context window that embeds some future frames [14,15]. Attempts have also been done to build low-latency bidirectional RNNs [16][17][18][19].…”
Section: Related Workmentioning
confidence: 99%
“…In particular, the use of feed-forward Deep Neural Networks (DNNs), including both fully-connected and convolutional architectures, has been largely investigated in the literature [7,8], especially in the context of online ASR performed on small-footprint devices [9][10][11][12][13]. Attempts have also been made to develop robust online speech recognizers based on RNNs, exploiting both the traditional RNN-HMM framework [14][15][16][17][18][19] and, more recently, end-to-end ASR technology [20,21].…”
Section: Introductionmentioning
confidence: 99%
“…LSTMs are explicitly designed to learn long-term dependencies of time-dependent data by remembering information for long periods. LSTM performs faithful learning in applications such as speech recognition (Tian et al, 2017;Kim et al, 2017) and text processing (Shih et al, 2018;Simistira et al, 2015). Moreover, LSTM is also suitable for complex data sequences such as stock time series extracted from financial markets because it has internal memory, has capability of customization, and is free from gradient-related issues.…”
Section: Introductionmentioning
confidence: 99%