2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2019
DOI: 10.1109/iros40897.2019.8968522
|View full text |Cite
|
Sign up to set email alerts
|

Active Learning of Reward Dynamics from Hierarchical Queries

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 24 publications
(19 citation statements)
references
References 26 publications
0
19
0
Order By: Relevance
“…While having humans provide pairwise comparisons does not suffer from similar problems to collecting demonstrations, each comparison question is much less informative than a demonstration, because comparison queries can provide at most 1 bit of information. Prior works have attempted to tackle this problem by actively generating the comparison questions (Basu et al, 2019;Biyik and Sadigh, 2018;Katz et al, 2019;Sadigh et al, 2017;Wilde et al, 2019). Although they were able to achieve significant gains in terms of the required number of comparisons, we hypothesize that one can attain even better data efficiency by leveraging multiple sources of information, even when some sources might not completely align with the true reward functions, e.g., demonstrations as in the driving work by Basu et al (2017).…”
Section: Learning Reward Functions From Rankingsmentioning
confidence: 90%
See 1 more Smart Citation
“…While having humans provide pairwise comparisons does not suffer from similar problems to collecting demonstrations, each comparison question is much less informative than a demonstration, because comparison queries can provide at most 1 bit of information. Prior works have attempted to tackle this problem by actively generating the comparison questions (Basu et al, 2019;Biyik and Sadigh, 2018;Katz et al, 2019;Sadigh et al, 2017;Wilde et al, 2019). Although they were able to achieve significant gains in terms of the required number of comparisons, we hypothesize that one can attain even better data efficiency by leveraging multiple sources of information, even when some sources might not completely align with the true reward functions, e.g., demonstrations as in the driving work by Basu et al (2017).…”
Section: Learning Reward Functions From Rankingsmentioning
confidence: 90%
“…We have previously developed several tools to improve the computational efficiency of volume removal, or to extend volume removal to better accommodate human users. These tools include batch optimization (Biyik and Sadigh, 2018;Bıyık et al, 2019), iterated correction (Palan et al, 2019), and dynamically changing reward functions (Basu et al, 2019). Importantly, the listed tools are agnostic to the details of volume removal: they simply require (a) the query generation algorithm to operate in a greedy manner while (b) maintaining a belief over v. Our proposed information gain approach for generating easy queries satisfies both of these requirements.…”
Section: Useful Tools and Extensionsmentioning
confidence: 99%
“…We consider these agents as human-like agents assuming that humans have less task handling capabilities but their creativity and higher risk tolerance leads to noisy rational decisions. To model noise in agent decisions, we incorporate a noisy rational model, a widely used human decision model in cognitive science [19]- [21], into our proposed framework. In particular, we give an agent the option to take any action with certain probability defined as…”
Section: A Effect Of Heterogeneity On Team Performance 1) Heterogeneity In Capabilitiesmentioning
confidence: 99%
“…Other approaches-such as those that use probabilistic methods to learn a task [6]- [10]-also rely on highly skilled demonstrators, accounting for imperfections with relatively smallscale noise in the probabilistic representation. To enable task learning that more closely captures human preferences and accounts for imperfect or incomplete demonstrations, active learning methods have been developed [11]- [15]. In these approaches, the human is treated as an oracle that the autonomy can query, improving learning quality.…”
Section: Related Workmentioning
confidence: 99%