2016
DOI: 10.1016/j.neucom.2015.12.076
|View full text |Cite
|
Sign up to set email alerts
|

An optimized second order stochastic learning algorithm for neural network training

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 36 publications
(15 citation statements)
references
References 31 publications
0
15
0
Order By: Relevance
“…Pre-training Performance Layer decomposition [47] Convolutional layers required 2.5× speedup with no loss in accuracy [48] Convolutional layers required 2× speedup with < 1% accuracy drop [52] Whole network required 1.09× reduction in weights & 4.93× speedup in VGG-16 [56] Convolutional layers not required 76% reduction in weights in VGG-11 Pruning [150] Whole network required prune 90% parameters of the convolutional kernels [151] Whole network required prune 13× parameters in VGG-16 [64] Whole network not required 5.1× (CPU) & 3.1× (GPU) speedup in convolutional layers [67] Whole network required 34% inference FLOP reduction in VGG-16 utilize the second order information, which makes it prohibitive in practice for deep large neural networks. Therefore, more emphasis has been put on how to approximate the Hessian matrices, which consists of the second-order derivatives for simplicity [153].…”
Section: Target Layermentioning
confidence: 99%
“…Pre-training Performance Layer decomposition [47] Convolutional layers required 2.5× speedup with no loss in accuracy [48] Convolutional layers required 2× speedup with < 1% accuracy drop [52] Whole network required 1.09× reduction in weights & 4.93× speedup in VGG-16 [56] Convolutional layers not required 76% reduction in weights in VGG-11 Pruning [150] Whole network required prune 90% parameters of the convolutional kernels [151] Whole network required prune 13× parameters in VGG-16 [64] Whole network not required 5.1× (CPU) & 3.1× (GPU) speedup in convolutional layers [67] Whole network required 34% inference FLOP reduction in VGG-16 utilize the second order information, which makes it prohibitive in practice for deep large neural networks. Therefore, more emphasis has been put on how to approximate the Hessian matrices, which consists of the second-order derivatives for simplicity [153].…”
Section: Target Layermentioning
confidence: 99%
“…After that, the estimated output parameters are compared with the real outputs. For an efficient ANN model, the weights and biases of each layer are updated to estimated outputs with minimum error [20][21][22].…”
Section: Artificial Neural Network and Its Application In Drilling Opmentioning
confidence: 99%
“…This procedure continues until the error is reduced to a certain acceptable limit as shown in Fig. 1 (Liew et al 2016;Naganawa et al 2014;Razi et al 2013).…”
Section: Artificial Neural Network (Ann)mentioning
confidence: 99%