2022
DOI: 10.1609/icaps.v32i1.19844
|View full text |Cite
|
Sign up to set email alerts
|

Inferring Probabilistic Reward Machines from Non-Markovian Reward Signals for Reinforcement Learning

Abstract: The success of reinforcement learning in typical settings is predicated on Markovian assumptions on the reward signal by which an agent learns optimal policies. In recent years, the use of reward machines has relaxed this assumption by enabling a structured representation of non-Markovian rewards. In particular, such representations can be used to augment the state space of the underlying decision process, thereby facilitating non-Markovian reinforcement learning. However, these reward machines cannot capture … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…Since the introduction of Reward Machines (RMs) (Icarte et al 2018), there have been various new research directions such as learning the RM structure (Xu et al 2020), RM for partially observable environments (Toro Icarte et al 2019), multi-agent intention scheduling (Dann et al 2022), and probabilistic RMs (Dohmen et al 2022) to name a few. While these works primarily focused on RM algorithmic im-provements and theoretical analysis, their applications did not go beyond toy domains.…”
Section: Reward Machinementioning
confidence: 99%
See 1 more Smart Citation
“…Since the introduction of Reward Machines (RMs) (Icarte et al 2018), there have been various new research directions such as learning the RM structure (Xu et al 2020), RM for partially observable environments (Toro Icarte et al 2019), multi-agent intention scheduling (Dann et al 2022), and probabilistic RMs (Dohmen et al 2022) to name a few. While these works primarily focused on RM algorithmic im-provements and theoretical analysis, their applications did not go beyond toy domains.…”
Section: Reward Machinementioning
confidence: 99%
“…In this paper, we alleviate the above mentioned problem of gait specification by leveraging Reward Machines (RMs) (Icarte et al 2022), which specify reward functions through deterministic finite automatons. RMs have been applied to various domains for guiding RL agents (Xu et al 2020;Neary et al 2020;Camacho et al 2021;Dohmen et al 2022). In this paper, RM serves as high-level specifications of gaits for low-level locomotion policy learning.…”
Section: Introductionmentioning
confidence: 99%
“…In contrast, our maximumlikelihood approach does not a priori require any structure of the specification or the spatial MDP environment. Meanwhile, [11,13,27,31,37] use Angluin [5]'s L * algorithm to learn a TA, relying on an oracle for equivalence and membership queries. We assume that the agent cannot access an oracle and must learn the TA fully autonomously, which aligns with the standard setup of model-free RL (note that L * was not originally developed for RL applications).…”
Section: Related Researchmentioning
confidence: 99%
“…The most appropriate method will depend on the use-case as different approaches choose to relax different assumptions. For example, Corazza et al [9] and Dohmen et al [11] extend the SAT-based approach to noisy rewards and the L * approach to learning probabilistic automata, respectively.…”
Section: Related Researchmentioning
confidence: 99%
See 1 more Smart Citation