Adaptive Inverse Induction Machine Control Based on Variable Learning Rate BP Algorithm

Xie, Shuying; Zhang, Chengjin; Xiao, Xiangli

doi:10.1109/ical.2007.4338973

Cited by 13 publications

(11 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some studies [20,69] have demonstrated that current optimization approaches, such as SGD [55], Adam [40], AdamW [51] and others [19,48] affect generalization. Some previous literature finds that Adam is more vulnerable to sharp minima than SGD [65], which results in worse generalization ability [22,28,68].…”

Section: Optimizermentioning

confidence: 99%

See 1 more Smart Citation

Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

Zhang¹,

Xu²,

Han³

et al. 2023

Preprint

View full text Add to dashboard Cite

Recently, flat minima are proven to be effective for improving generalization and sharpness-aware minimization (SAM) achieves state-of-the-art performance. Yet the current definition of flatness discussed in SAM and its followups are limited to the zeroth-order flatness (i.e., the worstcase loss within a perturbation radius). We show that the zeroth-order flatness can be insufficient to discriminate minima with low generalization error from those with high generalization error both when there is a single minimum or multiple minima within the given perturbation radius. Thus we present first-order flatness, a stronger measure of flatness focusing on the maximal gradient norm within a perturbation radius which bounds both the maximal eigenvalue of Hessian at local minima and the regularization function of SAM. We also present a novel training procedure named Gradient norm Aware Minimization (GAM) to seek minima with uniformly small curvature across all directions. Experimental results show that GAM improves the generalization of models trained with current optimizers such as SGD and AdamW on various datasets and networks. Furthermore, we show that GAM can help SAM find flatter minima and achieve better generalization.

show abstract

Section: Optimizermentioning

confidence: 99%

“…Some previous literature finds that Adam is more vulnerable to sharp minima than SGD [65], which results in worse generalization ability [22,28,68]. Some following works [10,52,69,76] propose generalizable optimizers to address this problem. However, it can be a trade-off between generalization ability and convergence speed [19,38,48,69,76].…”

Section: Optimizermentioning

confidence: 99%

Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

Zhang¹,

Xu²,

Han³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…The learning rate variable BP algorithm [37,41] has an adaptive ability of adjusting the learning rate gradient descent in terms of the variation of error. The learning rate will increase if the error decreases; otherwise, the adjustment is wrong, and the step size should be reduced.…”

Section: Figure 1 Bp Neural Network Structurementioning

confidence: 99%

“…However, the network convergence speed of the BP neural network algorithm is a little slow due to the problem of finding the first derivative of the objective function. In general, for improving the convergence speed, there exists a kind of improved heuristic algorithms, such as the momentum Back Propagation [35][36] and variable learning rate Back Propagation [37]. And also, there exists another kind of improved numerical optimization method, such as the conjugate gradient Back Propagation [38] and the Levenberg-Marquardt Back Propagation (LMBP) [39][40].…”

Section: Introductionmentioning

confidence: 99%

A High-Accuracy and Fast Retrieval Method of Atmospheric Parameters Based on Genetic-BP

Tian

Shi

2022

IEEE Access

View full text Add to dashboard Cite

show abstract

“…It has drawn much Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/conengprac attention in engineering applications with its advantages of clear physical concept, being intuitive and easy to understand (Yuan & Guo, 1994;Han et al, 2001;Dai, He, Zhang, & Zhang, 2003;Dai, 2005;Xie, Zhang, & Xiao, 2007). But solving for the inverse system model of a complex multivariable system is a bottleneck.…”

Section: Introductionmentioning

confidence: 99%

Intelligent coordinated controller design for a 600MW supercritical boiler unit based on expanded-structure neural network inverse models

Lee

Wang

2016

Control Engineering Practice

View full text Add to dashboard Cite

Adaptive Inverse Induction Machine Control Based on Variable Learning Rate BP Algorithm

Cited by 13 publications

References 9 publications

Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

A High-Accuracy and Fast Retrieval Method of Atmospheric Parameters Based on Genetic-BP

Intelligent coordinated controller design for a 600MW supercritical boiler unit based on expanded-structure neural network inverse models

Contact Info

Product

Resources

About