2019
DOI: 10.48550/arxiv.1910.04365
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Asking Easy Questions: A User-Friendly Approach to Active Reward Learning

Abstract: Robots can learn the right reward function by querying a human expert. Existing approaches attempt to choose questions where the robot is most uncertain about the human's response; however, they do not consider how easy it will be for the human to answer! In this paper we explore an information gain formulation for optimally selecting questions that naturally account for the human's ability to answer. Our approach identifies questions that optimize the trade-off between robot and human uncertainty, and determi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(14 citation statements)
references
References 28 publications
0
14
0
Order By: Relevance
“…Some colormap pairs may be equally preferable. We therefore include an "indifference" response [9] by introducing the minimum perceivable difference threshold 𝛿 ≥ 0 [35] such that:…”
Section: Belief Modelmentioning
confidence: 99%
See 1 more Smart Citation
“…Some colormap pairs may be equally preferable. We therefore include an "indifference" response [9] by introducing the minimum perceivable difference threshold 𝛿 ≥ 0 [35] such that:…”
Section: Belief Modelmentioning
confidence: 99%
“…In its training phase, Cieran adaptively and iteratively asks an analyst to choose between two different versions of their visualization each employing a different expert-designed colormap. This approach leverages prior work in color science which finds that presenting alternative choices is known to elicit the most reliable human responses for studying color preferences [54] while also being an easy task to respond to [9]. Cieran uses this choice data to update a model of aesthetic utility, which can be used to rank expert-designed sequential colormaps.…”
Section: Introductionmentioning
confidence: 99%
“…However, while these approaches have shown significant success in a number of domains [7,10,9,32], learning from purely offline data leads to a trajectory distribution mismatch which yields suboptimal performance both in theory and practice [12,13]. To address this problem, there have been a number of approaches that utilize online human feedback while the agent acts in the environment, such as providing suggested actions [12,35,36,17] or preferences [37,38,39,40,41,42]. However, many of these forms of human feedback may be unreliable if the robot visits states that significantly differ from those the human supervisor would themselves visit; in such situations, it is challenging for the supervisor to determine what correct behavior should look like without directly interacting with the environment [16,43].…”
Section: Related Workmentioning
confidence: 99%
“…More generally, the idea of learning from action advice has been widely explored in imitation learning algorithms [5,21,22,28]. There has also been significant recent interest in active preference queries for learning reward functions from pairwise preferences over demonstrations [7,10,13,19,32,39]. However, many forms of human advice can be unintuitive, since the learner may visit states that are significantly far from those the human supervisor would visit, making it difficult for humans to judge what correct behavior looks like without interacting with the environment themselves [36,42].…”
Section: Background and Related Workmentioning
confidence: 99%