2019 IEEE 58th Conference on Decision and Control (CDC) 2019
DOI: 10.1109/cdc40024.2019.9030194
|View full text |Cite
|
Sign up to set email alerts
|

Potential-Based Advice for Stochastic Policy Learning

Abstract: This paper augments the reward received by a reinforcement learning agent with potential functions in order to help the agent learn (possibly stochastic) optimal policies. We show that a potential-based reward shaping scheme is able to preserve optimality of stochastic policies, and demonstrate that the ability of an agent to learn an optimal policy is not affected when this scheme is augmented to soft Q-learning. We propose a method to impart potential-based advice schemes to policy gradient algorithms. An al… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
10
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(10 citation statements)
references
References 17 publications
0
10
0
Order By: Relevance
“…We show that using SAM allows agents to learn policies to complete the tasks faster, and obtain higher rewards than: i) using sparse rewards alone, and ii) a state-ofthe-art reward redistribution technique. This paper extends techniques introduced in our previous work [16] for single-agent RL to the multi-agent setting.…”
Section: Introductionmentioning
confidence: 79%
See 2 more Smart Citations
“…We show that using SAM allows agents to learn policies to complete the tasks faster, and obtain higher rewards than: i) using sparse rewards alone, and ii) a state-ofthe-art reward redistribution technique. This paper extends techniques introduced in our previous work [16] for single-agent RL to the multi-agent setting.…”
Section: Introductionmentioning
confidence: 79%
“…A potential term will not have to be added to ensure an unbiased policy gradient when utilizing look-back advice. This insight follows from Proposition 3 in [16] since we consider decentralized policies.…”
Section: Shaping Advice In Multi-agent Actor-criticmentioning
confidence: 87%
See 1 more Smart Citation
“…The aforementioned works had focused on the use of potential-based methods in environments with discrete action spaces. A preliminary version of this paper [20] introduced potential-based techniques to learn stochastic policies in single-agent RL with continuous states and actions.…”
Section: Related Workmentioning
confidence: 99%
“…We show that using shaping advice allows agents to learn policies to complete the tasks faster, and obtain higher rewards than: i) using sparse rewards alone, and ii) a state-of-the-art reward redistribution technique from [19]. Compared to a preliminary version that appeared in [20], in this paper, we develop a comprehensive framework for providing shaping advice in both, single-and multi-agent RL. We provide detailed theoretical analyses and experimental evaluations for each setting.…”
Section: Introductionmentioning
confidence: 99%