2020
DOI: 10.48550/arxiv.2003.12151
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Q-Learning in Regularized Mean-field Games

Abstract: In this paper, we introduce a regularized mean-field game and study learning of this game under an infinite-horizon discounted reward function. The game is defined by adding a regularization function to the one-stage reward function in the classical mean-field game model. We establish a value iteration based learning algorithm to this regularized mean-field game using fitted Q-learning. This regularization term in general makes reinforcement learning algorithm more robust with improved exploration. Moreover, i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
18
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(18 citation statements)
references
References 17 publications
0
18
0
Order By: Relevance
“…With a general dynamics T , the propagation of the distribution of µ N t could be computationally expensive. On the other hand, under the identical policy π used by all agents, the trajectory of the mean field is deterministic [21]. Furthermore, µ t follows a simple propagation rule:…”
Section: Mean Field Approximationmentioning
confidence: 99%
See 1 more Smart Citation
“…With a general dynamics T , the propagation of the distribution of µ N t could be computationally expensive. On the other hand, under the identical policy π used by all agents, the trajectory of the mean field is deterministic [21]. Furthermore, µ t follows a simple propagation rule:…”
Section: Mean Field Approximationmentioning
confidence: 99%
“…More recent works directly introduced a relative entropy term to the reward structure to provide regularity conditions. The existence of stationary entropy-regularized MFE was examined in [21]. The authors in [19] studied transient MFGs with finite horizon.…”
Section: Introductionmentioning
confidence: 99%
“…To approximate stationary MFG solutions, (Guo et al, 2019) uses fixed point iterations on the distribution combined with Q-learning to learn the best response at each iteration. (Anahtarci et al, 2020) combines this kind of learning scheme together with an entropic regularization. Convergence of an actor-critic method for linear-quadratic MFG has been studied in (Fu et al, 2019).…”
Section: Introductionmentioning
confidence: 99%
“…However, there is virtually no theoretical study on the role of entropy regularization in multiagent RL (MARL), with the exception of [2]. Indeed, most existing studies are empirical, demonstrating convergence improvement and variance reduction when entropy regularization is added.…”
Section: Introductionmentioning
confidence: 99%
“…For instance, [12] showed via empirical analysis that policy features can be learned directly from pure observations of other agents and that the non-stationarity of the environment can be reduced with addition of the cross-entropy; [11] applied the cross-entropy regularization to demonstrate the convergence of fictitious play in a discrete-time model with a finite number of agents while [22] used the cross-entropy loss to train the prediction of other agents' actions via observations of their behavior. The only theoretical work so far can be found in [2] with an infinite horizon setting in which a regularized Q-learning algorithm for stationary discrete-time mean field game was proposed along with its convergence analysis. Still, the problem remains open in the finite time horizon cases, which arises often in many applications.…”
Section: Introductionmentioning
confidence: 99%