2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013
DOI: 10.1109/icassp.2013.6639349
|View full text |Cite
|
Sign up to set email alerts
|

Advances in optimizing recurrent networks

Abstract: After a more than decade-long period of relatively little research activity in the area of recurrent neural networks, several new developments will be reviewed here that have allowed substantial progress both in understanding and in technical solutions towards more efficient training of recurrent networks. These advances have been motivated by and related to the optimization issues surrounding deep learning. Although recurrent networks are extremely powerful in what they can in principle represent in terms of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
253
0

Year Published

2015
2015
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 380 publications
(254 citation statements)
references
References 13 publications
1
253
0
Order By: Relevance
“…To train the models we used minibatch gradient descent with a batch size 8 of 16 and Nesterov momentum (Bengio et al 2013) with coefficient µ = 0.9. Nesterov momentum is a method for accelerating gradient descent by accumulating gradients over time in directions that consistently decrease the objective function value.…”
Section: Trainingmentioning
confidence: 99%
“…To train the models we used minibatch gradient descent with a batch size 8 of 16 and Nesterov momentum (Bengio et al 2013) with coefficient µ = 0.9. Nesterov momentum is a method for accelerating gradient descent by accumulating gradients over time in directions that consistently decrease the objective function value.…”
Section: Trainingmentioning
confidence: 99%
“…To evaluate the feature extraction and fusion capability of the proposed CNN, we compare it with some widely cited methods including support vector machine (SVM) [20], Adaptive Boosting technique [21], polynomial kernel PCA (KPCA) [22], sparse representation based on monogenic signal (MSRC) [23] and Iterative graph thickening model of image representations (IGT) [24]. The recognition accuracies of these algorithms are shown in Table VI.…”
Section: 4comparison With Previous Methodsmentioning
confidence: 99%
“…Accuracy (%) SVM [20] 86.73 Adaptive Boosting [21] 92.70 KPCA [22] 92.67 MSRC [23] 93.66 IGT [24] 95.00 CNN with data augmentation [25] It is desirable that the feature fusion CNNs show the ability of classifying ten-class targets regardless of the existence of variants. Meanwhile, neither increasing the width of network nor the complexity of architectures, the excellent combination of high-level feature representations and learning of potential invariant feature can achieve particularly good performance even when the labelled training data is limited.…”
Section: Methodsmentioning
confidence: 99%
“…The proposed deep learning method is expected to solve the feature automatic design problem. [13][14][15][16][17] Deep learning is a bio-inspired architecture which describes data such as image, voice and text by mimicking the human brain mechanisms of learning and analysis. Through deep learning, features are transformed from original space of lower layer to new space in higher layer.…”
Section: Introductionmentioning
confidence: 99%