Grammars for Games: A Gradient-Based, Game-Theoretic Framework for Optimization in Deep Learning

Balduzzi, David

doi:10.3389/frobt.2015.00039

Cited by 6 publications

(2 citation statements)

References 58 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…An interesting consequence of the main result is corollary 2 which provides a compact description of the weights learned by a neural network via the signal underlying correlated equilibrium. More generally, neural nets are a basic example of a game with a structured communication protocol (the path-sums) which determines how players interact [44]. It may be fruitful to investigate broader classes of structured games.…”

Section: Summary Of Contributionmentioning

confidence: 99%

Deep Online Convex Optimization with Gated Games

Balduzzi¹

2016

Preprint

Self Cite

View full text Add to dashboard Cite

Methods from convex optimization are widely used as building blocks for deep learning algorithms. However, the reasons for their empirical success are unclear, since modern convolutional networks (convnets), incorporating rectifier units and max-pooling, are neither smooth nor convex. Standard guarantees therefore do not apply. This paper provides the first convergence rates for gradient descent on rectifier convnets. The proof utilizes the particular structure of rectifier networks which consists in binary active/inactive gates applied on top of an underlying linear network. The approach generalizes to max-pooling, dropout and maxout. In other words, to precisely the neural networks that perform best empirically. The key step is to introduce gated games, an extension of convex games with similar convergence properties that capture the gating function of rectifiers. The main result is that rectifier convnets converge to a critical point at a rate controlled by the gated-regret of the units in the network. Corollaries of the main result include: (i) a game-theoretic description of the representations learned by a neural network; (ii) a logarithmic-regret algorithm for training neural nets; and (iii) a formal setting for analyzing conditional computation in neural nets that can be applied to recently developed models of attention.

show abstract

Section: Summary Of Contributionmentioning

confidence: 99%

Deep Online Convex Optimization with Gated Games

Balduzzi¹

2016

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…There has been a rising interest in game-theoretic analysis since the landmark Generative Adversarial Network [22]. By framing networks into a two-player competing game, prevalent efforts have been spent on studying its convergence dynamics [23] and effective optimizers to find stable saddle points [24,25], Notably, our layer-as-player formulation has appeared in Balduzzi [26] to study the signal communication implied in the Back-propagation, yet without any practical algorithm being made.…”

Section: Introductionmentioning

confidence: 99%

A Differential Game Theoretic Neural Optimizer for Training Residual Networks

Liu¹,

Chen²,

Theodorou³

2020

Preprint

View full text Add to dashboard Cite

Connections between Deep Neural Networks (DNNs) training and optimal control theory has attracted considerable attention as a principled tool of algorithmic design. Differential Dynamic Programming (DDP) neural optimizer [1] is a recently proposed method along this line. Despite its empirical success, the applicability has been limited to feedforward networks and whether such a trajectory-optimization inspired framework can be extended to modern architectures remains unclear. In this work, we derive a generalized DDP optimizer that accepts both residual connections and convolution layers. The resulting optimal control representation admits a game theoretic perspective, in which training residual networks can be interpreted as cooperative trajectory optimization on state-augmented dynamical systems. This Game Theoretic DDP (GT-DDP) optimizer enjoys the same theoretic connection in previous work, yet generates a much complex update rule that better leverages available information during network propagation. Evaluation on image classification datasets (e.g. MNIST and CIFAR100) shows an improvement in training convergence and variance reduction over existing methods. Our approach highlights the benefit gained from architecture-aware optimization.Preprint.

show abstract