2021
DOI: 10.1609/aaai.v35i13.17427
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Automaton-Guided Reward Shaping for Monte Carlo Tree Search

Abstract: Reinforcement learning and planning have been revolutionized in recent years, due in part to the mass adoption of deep convolutional neural networks and the resurgence of powerful methods to refine decision-making policies. However, the problem of sparse reward signals and their representation remains pervasive in many domains. While various rewardshaping mechanisms and imitation learning approaches have been proposed to mitigate this problem, the use of humanaided artificial rewards introduces human error, su… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6

Relationship

4
2

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 21 publications
0
5
0
Order By: Relevance
“…Another technique is to shape the reward in proportion to the distance from the accepting node in the automaton (Camacho et al 2018); however, this often leads to suboptimal reward settings. Augmenting the reward function with Monte Carlo Tree Search helps mitigate this issue (Velasquez et al 2021). This approach requires the ability to plan-ahead in the environment, which is not always feasible.…”
Section: Related Workmentioning
confidence: 99%
“…Another technique is to shape the reward in proportion to the distance from the accepting node in the automaton (Camacho et al 2018); however, this often leads to suboptimal reward settings. Augmenting the reward function with Monte Carlo Tree Search helps mitigate this issue (Velasquez et al 2021). This approach requires the ability to plan-ahead in the environment, which is not always feasible.…”
Section: Related Workmentioning
confidence: 99%
“…Non-Markovian planning has been studied extensively (Thiébaux et al 2006) (Brafman and De Giacomo 2019) (Bacchus, Boutilier, and Grove 1997) (Thiébaux, Kabanza, and Slaney 2002) (Gretton 2006) (Velasquez et al 2021). However, it is typically assumed that the goal is given in some automaton representation.…”
Section: Related Workmentioning
confidence: 99%
“…As a result, the reward shaping values of the a-transitions would be much higher than those of the b-transitions. In this paper, we extend the dynamic reward shaping approach presented in (Velasquez et al 2021) to handle multiple agents and account for both deterministic and stochastic transitions in the underlying NMRDP. It is worth noting that multi-agent reward machines have been considered recently in the context of reinforcement learning (Neary et al 2021).…”
Section: Related Workmentioning
confidence: 99%
“…Rather, we propose a novel dynamic reward shaping function that aids the MATS procedure to converge more quickly to higher expected values. Dynamic reward shaping was first introduced in (Velasquez et al 2021) for single-agent MCTS. We demonstrate how cooperative and competitive behavior can arise within and across teams by sharing the same search tree in MATS as well as sharing the same DFA objective within the respective teams.…”
Section: Introductionmentioning
confidence: 99%