2020 59th IEEE Conference on Decision and Control (CDC) 2020
DOI: 10.1109/cdc42340.2020.9304190
|View full text |Cite
|
Sign up to set email alerts
|

Active Task-Inference-Guided Deep Inverse Reinforcement Learning

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
14
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2
2

Relationship

3
6

Authors

Journals

citations
Cited by 25 publications
(18 citation statements)
references
References 17 publications
0
14
0
Order By: Relevance
“…Finally, we note that different approaches to learn RMs were proposed simultaneously, or shortly after, our original publication (e.g., Xu et al, 2020a,b;Furelos-Blanco et al, 2020;Rens et al, 2020;Gaon and Brafman, 2020;Memarian et al, 2020;Neider et al, 2021;Hasanbeig et al, 2021). They all learn reward machines in fully observable domains.…”
Section: Related Workmentioning
confidence: 95%
“…Finally, we note that different approaches to learn RMs were proposed simultaneously, or shortly after, our original publication (e.g., Xu et al, 2020a,b;Furelos-Blanco et al, 2020;Rens et al, 2020;Gaon and Brafman, 2020;Memarian et al, 2020;Neider et al, 2021;Hasanbeig et al, 2021). They all learn reward machines in fully observable domains.…”
Section: Related Workmentioning
confidence: 95%
“…Other related work [20] on learning from sparse rewards proposes a method to learn a temporally extended episodic task composed of several subtasks where the environment returns a sparse reward only at the end of the episodes. Using the environment's sparse feedback and queries from a demonstrator, they learn the highlevel task structure in the form of a deterministic finite state automaton, and then use the learned task structure in an inverse reinforcement learning (IRL) framework to infer a dense reward function for each subtask.…”
Section: B Sparse Rewardsmentioning
confidence: 99%
“…Learning human driver reward functions. Many previous works in learning human drivers' reward functions are extensions of MaxEnt IRL [4] and are restricted to single-agent settings [24]- [26]. Recently, the concept of quantal best response equilibrium (QRE) is exploited to extend MaxEnt IRL to multi-agent games.…”
Section: Related Workmentioning
confidence: 99%