2011
DOI: 10.1613/jair.3384
|View full text |Cite
|
Sign up to set email alerts
|

Policy Invariance under Reward Transformations for General-Sum Stochastic Games

Abstract: We extend the potential-based shaping method from Markov decision processes to multi-player general-sum stochastic games. We prove that the Nash equilibria in a stochastic game remains unchanged after potential-based shaping is applied to the environment. The property of policy invariance provides a possible way of speeding convergence when learning to play a stochastic game.

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2013
2013
2022
2022

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 13 publications
0
7
0
Order By: Relevance
“…Asmuth et al [3] show that R-max, a popular model-based RL method, is still PAC-MDP [31] when combined with an admissable potential function Φ(x, a) ≥ Q * (x, a). In multiagent RL, potential-based shaping provably preserves the Nash equilibrium in stochastic games [11,44]. Preservation of optimal policy and Nash equilibrium have also been shown to hold for potential functions that change while the agent is learning [12].…”
Section: Potential-based Shapingmentioning
confidence: 99%
“…Asmuth et al [3] show that R-max, a popular model-based RL method, is still PAC-MDP [31] when combined with an admissable potential function Φ(x, a) ≥ Q * (x, a). In multiagent RL, potential-based shaping provably preserves the Nash equilibrium in stochastic games [11,44]. Preservation of optimal policy and Nash equilibrium have also been shown to hold for potential functions that change while the agent is learning [12].…”
Section: Potential-based Shapingmentioning
confidence: 99%
“…An alternative approach to improve credit assignment is potential-based reward shaping. Although this requires prior knowledge of the problem domain, potential-based techniques have been shown to offer guarantees on optimality and convergence of policies in both single [13] and multi-agent [10], [36], [37] cases. The aforementioned works had focused on the use of potential-based methods in environments with discrete action spaces.…”
Section: Related Workmentioning
confidence: 99%
“…An alternative approach is potential-based reward shaping. Although this requires prior knowledge of the problem domain, potential-based techniques have been shown to offer guarantees on optimality and convergence of policies in both single [14] and multi-agent [11], [28], [29] cases. These works had focused on the use of potential-based methods in environments with discrete action spaces.…”
Section: Related Workmentioning
confidence: 99%