Policy Invariance under Reward Transformations for General-Sum Stochastic Games

Lu, Xiaosong; Schwartz, Howard M.; Givigi, Sidney N.

doi:10.1613/jair.3384

Cited by 8 publications

(7 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Asmuth et al [3] show that R-max, a popular model-based RL method, is still PAC-MDP [31] when combined with an admissable potential function Φ(x, a) ≥ Q * (x, a). In multiagent RL, potential-based shaping provably preserves the Nash equilibrium in stochastic games [11,44]. Preservation of optimal policy and Nash equilibrium have also been shown to hold for potential functions that change while the agent is learning [12].…”

Section: Potential-based Shapingmentioning

confidence: 99%

Learning potential functions and their representations for multi-task reinforcement learning

Snel

Whiteson

2013

Auton Agent Multi-Agent Syst

View full text Add to dashboard Cite

Section: Potential-based Shapingmentioning

confidence: 99%

Learning potential functions and their representations for multi-task reinforcement learning

Snel

Whiteson

2013

Auton Agent Multi-Agent Syst

View full text Add to dashboard Cite

“…An alternative approach to improve credit assignment is potential-based reward shaping. Although this requires prior knowledge of the problem domain, potential-based techniques have been shown to offer guarantees on optimality and convergence of policies in both single [13] and multi-agent [10], [36], [37] cases. The aforementioned works had focused on the use of potential-based methods in environments with discrete action spaces.…”

Section: Related Workmentioning

confidence: 99%

Shaping Advice in Deep Reinforcement Learning

Xiao¹,

Ramasubramanian²,

Poovendran³

2022

Preprint

View full text Add to dashboard Cite

Reinforcement learning involves agents interacting with an environment to complete tasks. When rewards provided by the environment are sparse, agents may not receive immediate feedback on the quality of actions that they take, thereby affecting learning of policies. In this paper, we propose methods to augment the reward signal from the environment with an additional reward termed shaping advice in both single-and multi-agent reinforcement learning. The shaping advice is specified as a difference of potential functions at consecutive time-steps. Each potential function is a function of observations and actions of the agents. The use of potential functions is underpinned by an insight that the total potential when starting from any state and returning to the same state is always equal to zero. We show through theoretical analyses and experimental validation that the shaping advice does not distract agents from completing tasks specified by the environment reward. Theoretically, we prove that the convergence of policy gradients and value functions when using shaping advice implies the convergence of these quantities in the absence of shaping advice. We design two algorithms-Shaping Advice in Single-agent reinforcement learning (SAS) and Shaping Advice in Multi-agent reinforcement learning (SAM). Shaping advice in SAS and SAM needs to be specified only once at the start of training, and can easily be provided by non-experts. Experimentally, we evaluate SAS and SAM on two tasks in single-agent environments and three tasks in multi-agent environments that have sparse rewards. We observe that using shaping advice results in agents learning policies to complete tasks faster, and obtain higher rewards than algorithms that do not use shaping advice. Code for our experiments is available at https://github.com/baicenxiao/Shaping-Advice.

show abstract

“…An alternative approach is potential-based reward shaping. Although this requires prior knowledge of the problem domain, potential-based techniques have been shown to offer guarantees on optimality and convergence of policies in both single [14] and multi-agent [11], [28], [29] cases. These works had focused on the use of potential-based methods in environments with discrete action spaces.…”

Section: Related Workmentioning

confidence: 99%

Shaping Advice in Deep Multi-Agent Reinforcement Learning

Xiao¹,

Ramasubramanian²,

Poovendran³

2021

Preprint

View full text Add to dashboard Cite

Multi-agent reinforcement learning involves multiple agents interacting with each other and a shared environment to complete tasks. When rewards provided by the environment are sparse, agents may not receive immediate feedback on the quality of actions that they take, thereby affecting learning of policies. In this paper, we propose a method called Shaping Advice in deep Multi-agent reinforcement learning (SAM) to augment the reward signal from the environment with an additional reward termed shaping advice. The shaping advice is given by a difference of potential functions at consecutive time-steps. Each potential function is a function of observations and actions of the agents. The shaping advice needs to be specified only once at the start of training, and can be easily provided by non-experts. We show through theoretical analyses and experimental validation that the shaping advice provided by SAM does not distract agents from completing tasks specified by the environment reward. Theoretically, we prove that the convergence of policy gradients and value functions when using SAM implies the convergence of these quantities in the same environment in the absence of SAM. Experimentally, we evaluate SAM on three tasks in the multi-agent Particle World environment that have sparse rewards. We observe that using SAM results in agents learning policies to complete tasks faster, and obtain higher rewards than: i) using sparse rewards alone; ii) a state-of-the-art reward redistribution method.

show abstract

Policy Invariance under Reward Transformations for General-Sum Stochastic Games

Cited by 8 publications

References 13 publications

Learning potential functions and their representations for multi-task reinforcement learning

Learning potential functions and their representations for multi-task reinforcement learning

Shaping Advice in Deep Reinforcement Learning

Shaping Advice in Deep Multi-Agent Reinforcement Learning

Contact Info

Product

Resources

About