In this paper, we introduce a regularized mean-field game and study learning of this game under an infinite-horizon discounted reward function. The game is defined by adding a regularization function to the one-stage reward function in the classical mean-field game model. We establish a value iteration based learning algorithm to this regularized mean-field game using fitted Q-learning. This regularization term in general makes reinforcement learning algorithm more robust with improved exploration. Moreover, it enables us to establish error analysis of the learning algorithm without imposing restrictive convexity assumptions on the system components, which are needed in the absence of a regularization term.
In the literature, existence of mean-field equilibria has been established for discretetime mean field games under both the discounted cost and the average cost optimality criteria. However, there is no algorithm with convergence guarantee for computing mean-field equilibria for a general class of models. In this paper, we provide a value iteration algorithm to compute mean-field equilibrium for both the discounted cost and the average cost criteria. We establish that the value iteration algorithm converges to the fixed point of a mean-field equilibrium operator. Then, using this fixed point, we construct a mean-field equilibrium. In our value iteration algorithm, we use Q-functions instead of value functions for possible extension of this work to the model-free setting.
In this paper, we introduce a regularized mean-field game and study learning of this game under an infinite-horizon discounted reward function. Regularization is introduced by adding a strongly concave regularization function to the one-stage reward function in the classical mean-field game model. We establish a value iteration based learning algorithm to this regularized mean-field game using fitted Q-learning. The regularization term in general makes reinforcement learning algorithm more robust to the system components. Moreover, it enables us to establish error analysis of the learning algorithm without imposing restrictive convexity assumptions on the system components, which are needed in the absence of a regularization term.
Keywords Mean-field gamesThis article is part of the topical collection "Multi-agent Dynamic Decision Making and Learning" edited by
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.