Adaptive Regularization in Neural Network Modeling

Larsen, Jan; Svarer, Claus; Andersen, Lars Nonboe; Hansen, Lars Kai

doi:10.1007/3-540-49430-8_6

Cited by 33 publications

(21 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In order to efficientize the search, previous work proposed to alternate the optimization of Λ and Θ, between consecutive full training runs [5,20], or on the fly [22,26]. Compared to grid-search, where Λ is fixed during a full training run, the on-the-fly adaptive methods in [22,26] adjusts Λ according to performance on validation sets every training step.…”

Section: Methodsmentioning

confidence: 99%

λOpt

Chen

et al. 2019

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining

View full text Add to dashboard Cite

Recommendation models mainly deal with categorical variables, such as user/item ID and attributes. Besides the high-cardinality issue, the interactions among such categorical variables are usually long-tailed, with the head made up of highly frequent values and a long tail of rare ones. This phenomenon results in the data sparsity issue, making it essential to regularize the models to ensure generalization. The common practice is to employ grid search to manually tune regularization hyperparameters based on the validation data. However, it requires non-trivial efforts and large computation resources to search the whole candidate space; even so, it may not lead to the optimal choice, for which different parameters should have different regularization strengths.In this paper, we propose a hyperparameter optimization method, λOpt 1 , which automatically and adaptively enforces regularization during training. Specifically, it updates the regularization coefficients based on the performance of validation data. With λOpt, the notorious tuning of regularization hyperparameters can be avoided; more importantly, it allows fine-grained regularization (i.e. each parameter can have an individualized regularization coefficient), leading to better generalized models. We show how to employ λOpt on matrix factorization, a classical model that is representative of a large family of recommender models. Extensive experiments on two public benchmarks demonstrate the superiority of our method in boosting the performance of top-K recommendation.

show abstract

Section: Methodsmentioning

confidence: 99%

λOpt

Chen

et al. 2019

Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining

View full text Add to dashboard Cite

show abstract

“…To overcome this challenge, some methods have been proposed by scholars, such as criteria‐based model selection, early stopping, Bayesian regularization, and stacked generalization . In the present study, ANN with Bayesian regularization was chosen as the base ANN for surrogate modelling, given its stability.…”

Section: Fundamentals and Improvementsmentioning

confidence: 99%

“…However, if the ratio is not properly selected, it will cause under‐fitting or over‐fitting problems. To set the ratio automatically, Bayesian analysis technology is combined with the regularization of ANN . Bayesian optimization of the regularization parameters with the Gauss‐Newton approximation to the Hessian matrix is iterated to maximize posterior density (the subscript and superscript MP in later equations) until convergence .…”

Section: Fundamentals and Improvementsmentioning

confidence: 99%

Adaptive Sampling for Surrogate Modelling with Artificial Neural Network and its Application in an Industrial Cracking Furnace

Jin

et al. 2016

Can J Chem Eng

View full text Add to dashboard Cite

In surrogate modelling, a simple functional approximation of a complex system model is always constructed to reduce the computational expense, and the selection of a suitable surrogate model and a sampling method are key to obtaining a surrogate model for a complex system. To construct an appropriate surrogate model, three methods of adaptive surrogate modelling that use artificial neural networks (ANN) are developed by incorporating a new mechanism for automatically determining the number of hidden nodes and/or a new prediction error-based mixed adaptive sampling method. In the automatic determination, the number of hidden nodes can adaptively change according to the effective rate of parameters in the ANN during the adaptive surrogate modelling process. As a result, an improper number of hidden nodes determined by the empirical method can be avoided. The prediction error-based mixed adaptive sampling method is capable of finding the strong nonlinear behaviour of the underlying system, which is easily missed by the traditional prediction variance-based sampling method. The three methods and the previous method for adaptive surrogate modelling that use ANN are tested and compared in terms of replicating the behaviours of three types of challenge functions to determine the efficacy of the developed methods. Furthermore, these methods are used in an engineering problem of surrogate modelling for a cracking reaction simulator to validate the efficacy of the developed methods.

show abstract

“…Gradient-based hyperparameter learning algorithms have been proposed for a variety of supervised learning models, including neural networks (Larsen et al, 1996a;Andersen et al, 1997;Goutte & Larsen, 1998;Larsen et al, 1996b), support vector machines (Glasmachers & Igel, 2005;Keerthi et al, 2007;Chapelle et al, 2002), and more recently, conditional log-linear models (Do et al, 2008). However, these algorithms typically require complicated computations, making them cumbersome to implement.…”

Section: Introductionmentioning

confidence: 99%

A majorization-minimization algorithm for (multiple) hyperparameter learning

Foo

Chuong

2009

Proceedings of the 26th Annual International Conference on Machine Learning

View full text Add to dashboard Cite

We present a general Bayesian framework for hyperparameter tuning in L 2 -regularized supervised learning models. Paradoxically, our algorithm works by first analytically integrating out the hyperparameters from the model. We find a local optimum of the resulting nonconvex optimization problem efficiently using a majorization-minimization (MM) algorithm, in which the non-convex problem is reduced to a series of convex L 2 -regularized parameter estimation tasks. The principal appeal of our method is its simplicity: the updates for choosing the L 2 -regularized subproblems in each step are trivial to implement (or even perform by hand), and each subproblem can be efficiently solved by adapting existing solvers. Empirical results on a variety of supervised learning models show that our algorithm is competitive with both grid-search and gradient-based algorithms, but is more efficient and far easier to implement.

show abstract

Adaptive Regularization in Neural Network Modeling

Cited by 33 publications

References 30 publications

λOpt

λOpt

Adaptive Sampling for Surrogate Modelling with Artificial Neural Network and its Application in an Industrial Cracking Furnace

A majorization-minimization algorithm for (multiple) hyperparameter learning

Contact Info

Product

Resources

About