2020
DOI: 10.48550/arxiv.2011.12946
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Exploratory LQG Mean Field Games with Entropy Regularization

Abstract: We study a general class of entropy-regularized multi-variate LQG mean field games (MFGs) in continuous time with K distinct subpopulation of agents. We extend the notion of actions to action distributions (exploratory actions), and explicitly derive the optimal action distributions for individual agents in the limiting MFG. We demonstrate that the optimal set of action distributions yields an ε-Nash equilibrium for the finite-population entropy-regularized MFG. Furthermore, we compare the resulting solutions … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 29 publications
(44 reference statements)
0
5
0
Order By: Relevance
“…Recently, the approach initiated in [61] has been extended to mean field games. In [62] and [36], the authors study the impact of an entropic regularization onto the shape of the equilibria. In both papers, the models under study are linear-quadratic and subjected to a sole idiosyncratic noise (i.e., there is no common noise).…”
Section: Black Box Outputmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, the approach initiated in [61] has been extended to mean field games. In [62] and [36], the authors study the impact of an entropic regularization onto the shape of the equilibria. In both papers, the models under study are linear-quadratic and subjected to a sole idiosyncratic noise (i.e., there is no common noise).…”
Section: Black Box Outputmentioning
confidence: 99%
“…In both papers, the models under study are linear-quadratic and subjected to a sole idiosyncratic noise (i.e., there is no common noise). However, they differ on the following important point: In [36], the intensity of the idiosyncratic noise is constant, whilst it depends on the standard deviation of the control in [62]; In this sense, [36] is closer to the set-up that we investigate here. Accordingly, the presence of the entropic regularization leads to different consequences: In [62], the effective intensity of the idiosyncratic noise grows up under the action of the entropy and this is shown to help numerically in some learning method (of a quite different spirit than ours); In [36], the entropy plays no role on the structure of the equilibria, which demonstrates, if needed, that our approach here is substantially different.…”
Section: Black Box Outputmentioning
confidence: 99%
“…In Section 3.3, we provide a policy gradient formula when the agent controls not the action itself, but rather its distribution. Such actions are also referred to as relaxed controls, see [14,7,8].…”
Section: Example 3 Robust Dynamic Trading Strategymentioning
confidence: 99%
“…A (risk-neutral) distributionally robust RL approach for Markov decision processes, where robustness is induced by looking at all transition probabilities (from a given state) that have relative entropy with respect to (wrt) a reference probability less than a given epsilon, is developed in [12]. [1] develops a (risk-neutral) robust RL paradigm where policies are randomised with a distribution that depends on the current state, see [14] for a continuous time version of randomised policies with entropy regularisation and [8,7] for its generalisation to mean-field game settings. In [1], the uncertainty is placed on the conditional transition probability from old state and action to new state, and the set of distributions are those that lie within an "average" 2-Wasserstein ball around a benchmark model's distribution.…”
mentioning
confidence: 99%
“…This exploratory formulation has been extended to other settings and used to solve applied problems; see e.g. Guo et al (2020) and Firoozi and Jaimungal (2020) to mean-field games, and to Markowitz mean-variance portfolio optimization. Gao et al (2020) apply the same formulation to temperature control of Langevin diffusions arising from simulated annealing for non-convex optimization.…”
Section: Introductionmentioning
confidence: 99%