Mean-normalized stochastic gradient for large-scale deep learning

Wiesler, Simon; Richard, Alexander; Schlüter, Ralf; Ney, Hermann

doi:10.1109/icassp.2014.6853582

Cited by 58 publications

(58 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Neural networks with multiple linear bottlenecks can not be trained well with SGD. In our previous work [15], we derived a stochastic algorithm called mean-normalized stochastic gradient descent (MN-SGD) and showed that it is capable of optimizing such bottleneck networks from scratch For sequence training, the advantages of stochastic optimization are less compelling. First, because of their frequent model updates, the lattices deviate quickly from the search space of the current model.…”

Section: Optimizationmentioning

confidence: 99%

See 1 more Smart Citation

Investigations on sequence training of neural networks

Wiesler

Golik

Schlüter

et al. 2015

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

In this paper we present an investigation of sequence-discriminative training of deep neural networks for automatic speech recognition. We evaluate different sequence-discriminative training criteria (MMI and MPE) and optimization algorithms (including SGD and Rprop) using the RASR toolkit. Further, we compare the training of the whole network with that of the output layer only. Technical details necessary for a robust training are studied, since there is no consensus yet on the ultimate training recipe. The investigation extends our previous work on training linear bottleneck networks from scratch showing the consistently positive effect of sequence training.

show abstract

Section: Optimizationmentioning

confidence: 99%

“…In this paper, we present experiments on an English broadcast conversations recognition task with our our implementation of sequence training in the RASR toolkit [15]. We compare different different training criteria and optimization algorithms.…”

Section: Introductionmentioning

confidence: 99%

Investigations on sequence training of neural networks

Wiesler

Golik

Schlüter

et al. 2015

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…This simplifies the implementation of alternative optimization algorithms, which is currently an active research area for neural networks [13,14]. Currently, the supported estimators include basic SGD, SGD with momentum, and a new stochastic second-order algorithm developed in our group [15]. In addition, batch estimation with gradient descent and Rprop [16] is possible.…”

Section: Trainingmentioning

confidence: 99%

“…RASR also supports a power schedule that decays with every mini-batch and has been reported to perform slightly better than Newbob [18]. Mostly, we use a modification of Newbob which decays the learning rate less aggressively than the original Newbob [15].…”

Section: Trainingmentioning

confidence: 99%

RASR/NN: The RWTH neural network toolkit for speech recognition

Wiesler

Richard

Golik

et al. 2014

2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

This paper describes the new release of RASR -the open source version of the well-proven speech recognition toolkit developed and used at RWTH Aachen University. The focus is put on the implementation of the NN module for training neural network acoustic models. We describe code design, configuration, and features of the NN module. The key feature is a high flexibility regarding the network topology, choice of activation functions, training criteria, and optimization algorithm, as well as a built-in support for efficient GPU computing. The evaluation of run-time performance and recognition accuracy is performed exemplary with a deep neural network as acoustic model in a hybrid NN/HMM system. The results show that RASR achieves a state-of-the-art performance on a real-world large vocabulary task, while offering a complete pipeline for building and applying large scale speech recognition systems.

show abstract

“…The initial technique used for batch normalisation [28] was based on modifications of the inputs at each layer to achieve a constant distribution of features. This modification can be perceived as either creating a dependency on the activation values for the optimiser or a direct change in the overall network structure [30,31].…”

Section: Normalisationmentioning

confidence: 99%

Traffic Sign Recognition based on Synthesised Training Data

Stergiou

Kalliatakis

Chrysoulas

2018

BDCC

View full text Add to dashboard Cite

Abstract:To deal with the richness in visual appearance variation found in real-world data, we propose to synthesise training data capturing these differences for traffic sign recognition. The use of synthetic training data, created from road traffic sign templates, allows overcoming the problems of existing traffic sing recognition databases, which are only subject to specific sets of road signs found explicitly in countries or regions. This approach is used for generating a database of synthesised images depicting traffic signs under different view-light conditions and rotations, in order to simulate the complexity of real-world scenarios. With our synthesised data and a robust end-to-end Convolutional Neural Network (CNN), we propose a data-driven, traffic sign recognition system that can achieve not only high recognition accuracy, but also high computational efficiency in both training and recognition processes.

show abstract

Mean-normalized stochastic gradient for large-scale deep learning

Cited by 58 publications

References 13 publications

Investigations on sequence training of neural networks

Investigations on sequence training of neural networks

RASR/NN: The RWTH neural network toolkit for speech recognition

Traffic Sign Recognition based on Synthesised Training Data

Contact Info

Product

Resources

About