Deep LSTM for Large Vocabulary Continuous Speech Recognition

Xu, Tian; Zhang, Jun; Ma, Zejun; He, Yi; Wei, Juan; Wu, Peihao; Situ, Wenchang; Li, Shuai; Zhang, Yang

doi:10.48550/arxiv.1703.07090

Cited by 6 publications

(6 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Non-pruning methods. In addition to pruning, other approaches also make significant contribution to LSTMs compression, including distillation (Tian et al, 2017), matrix factorization (Kuchaiev & Ginsburg, 2017), parameter sharing (Lu et al, 2016), group Lasso regularization (Wen et al, 2017), weight quantization (Zen et al, 2016), etc.…”

Section: Lstm Compressionmentioning

confidence: 99%

“…Different from other neural networks, LSTMs are relatively more challenging to be compressed due to the complicated architecture that the information gained from one cell will be shared across all the time steps (Wen et al, 2017). Despite this challenge, researchers already proposed many effective methods to address this problem, including Sparse Variational Dropout (Sparse VD) (Lobacheva et al, 2017), sparse regularization (Wen et al, 2017), distillation (Tian et al, 2017), low-rank factorizations and parameter sharing (Lu et al, 2016) and pruning (Han et al, 2017;Narang et al, 2017;Lee et al, 2018), etc. All of them can achieve promising compression rates with negligible performance loss.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Intrinsically Sparse Long Short-Term Memory Networks

Liu,

Mocanu,

Pechenizkiy

2019

Preprint

View full text Add to dashboard Cite

Long Short-Term Memory (LSTM) has achieved state-of-the-art performances on a wide range of tasks. Its outstanding performance is guaranteed by the long-term memory ability which matches the sequential data perfectly and the gating structure controlling the information flow. However, LSTMs are prone to be memory-bandwidth limited in realistic applications and need an unbearable period of training and inference time as the model size is ever-increasing. To tackle this problem, various efficient model compression methods have been proposed. Most of them need a big and expensive pre-trained model which is a nightmare for resource-limited devices where the memory budget is strictly limited. To remedy this situation, in this paper, we incorporate the Sparse Evolutionary Training (SET) procedure into LSTM, proposing a novel model dubbed SET-LSTM. Rather than starting with a fully-connected architecture, SET-LSTM has a sparse topology and dramatically fewer parameters in both phases, training and inference. Considering the specific architecture of LSTMs, we replace the LSTM cells and embedding layers with sparse structures and further on, use an evolutionary strategy to adapt the sparse connectivity to the data. Additionally, we find that SET-LSTM can provide many different good combinations of sparse connectivity to substitute the overparameterized optimization problem of dense neural networks. Evaluated on four sentiment analysis classification datasets, the results demonstrate that our proposed model is able to achieve usually better performance than its fully connected counterpart while having less than 4% of its parameters.

show abstract

Section: Lstm Compressionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Intrinsically Sparse Long Short-Term Memory Networks

Liu,

Mocanu,

Pechenizkiy

2019

Preprint

View full text Add to dashboard Cite

show abstract

“…Several methods have been proposed in the literature to approach online ASR with RNNs. A popular choice is to feed the RNN with a context window that embeds some future frames [14,15]. Attempts have also been done to build low-latency bidirectional RNNs [16][17][18][19].…”

Section: Related Workmentioning

confidence: 99%

“…In particular, the use of feed-forward Deep Neural Networks (DNNs), including both fully-connected and convolutional architectures, has been largely investigated in the literature [7,8], especially in the context of online ASR performed on small-footprint devices [9][10][11][12][13]. Attempts have also been made to develop robust online speech recognizers based on RNNs, exploiting both the traditional RNN-HMM framework [14][15][16][17][18][19] and, more recently, end-to-end ASR technology [20,21].…”

Section: Introductionmentioning

confidence: 99%

Twin Regularization for Online Speech Recognition

2018

View full text Add to dashboard Cite

Online speech recognition is crucial for developing natural human-machine interfaces. This modality, however, is significantly more challenging than off-line ASR, since real-time/lowlatency constraints inevitably hinder the use of future information, that is known to be very helpful to perform robust predictions.A popular solution to mitigate this issue consists of feeding neural acoustic models with context windows that gather some future frames. This introduces a latency which depends on the number of employed look-ahead features.This paper explores a different approach, based on estimating the future rather than waiting for it. Our technique encourages the hidden representations of a unidirectional recurrent network to embed some useful information about the future. Inspired by a recently proposed technique called Twin Networks, we add a regularization term that forces forward hidden states to be as close as possible to cotemporal backward ones, computed by a "twin" neural network running backwards in time.The experiments, conducted on a number of datasets, recurrent architectures, input features, and acoustic conditions, have shown the effectiveness of this approach. One important advantage is that our method does not introduce any additional computation at test time if compared to standard unidirectional recurrent networks.

show abstract

“…LSTMs are explicitly designed to learn long-term dependencies of time-dependent data by remembering information for long periods. LSTM performs faithful learning in applications such as speech recognition (Tian et al, 2017;Kim et al, 2017) and text processing (Shih et al, 2018;Simistira et al, 2015). Moreover, LSTM is also suitable for complex data sequences such as stock time series extracted from financial markets because it has internal memory, has capability of customization, and is free from gradient-related issues.…”

Section: Introductionmentioning

confidence: 99%

Real-time Forecasting of Time Series in Financial Markets Using Sequentially Trained Many-to-one LSTMs

Gajamannage¹,

Park²

2022

Preprint

View full text Add to dashboard Cite

Financial markets are highly complex and volatile; thus, learning about such markets for the sake of making predictions is vital to make early alerts about crashes and subsequent recoveries. People have been using learning tools from diverse fields such as financial mathematics and machine learning in the attempt of making trustworthy predictions on such markets. However, the accuracy of such techniques had not been adequate until artificial neural network (ANN) frameworks were developed. Moreover, making accurate realtime predictions of financial time series is highly subjective to the ANN architecture in use and the procedure of training it. Long short-term memory (LSTM) is a member of the recurrent neural network family which has been widely utilized for time series predictions. Especially, we train two LSTMs with a known length, say T time steps, of previous data and predict only one time step ahead. At each iteration, while one LSTM is employed to find the best number of epochs, the second LSTM is trained only for the best number of epochs to make predictions. We treat the current prediction as in the training set for the next prediction and train the same LSTM. While classic ways of training result in more error when the predictions are made further away in the test period, our approach is capable of maintaining a superior accuracy as training increases when it proceeds through the testing period. The forecasting accuracy of our approach is validated using three time series from each of the three diverse financial markets: stock, cryptocurrency, and commodity. The results are compared with those of an extended Kalman filter, an autoregressive model, and an autoregressive integrated moving average model.

show abstract

Deep LSTM for Large Vocabulary Continuous Speech Recognition

Cited by 6 publications

References 16 publications

Intrinsically Sparse Long Short-Term Memory Networks

Intrinsically Sparse Long Short-Term Memory Networks

Twin Regularization for Online Speech Recognition

Real-time Forecasting of Time Series in Financial Markets Using Sequentially Trained Many-to-one LSTMs

Contact Info

Product

Resources

About