2019
DOI: 10.48550/arxiv.1906.08928
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Learning Reward Functions by Integrating Human Demonstrations and Preferences

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
18
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 12 publications
(18 citation statements)
references
References 0 publications
0
18
0
Order By: Relevance
“…The concept of learning a hidden reward function from a user is widely used in various human-robot interaction frameworks, such as learning from demonstrations (LfD) [4], [17], learning from corrections [18], [19] and learning from preferences [1], [3], [12], [13], [17].…”
Section: A Related Workmentioning
confidence: 99%
“…The concept of learning a hidden reward function from a user is widely used in various human-robot interaction frameworks, such as learning from demonstrations (LfD) [4], [17], learning from corrections [18], [19] and learning from preferences [1], [3], [12], [13], [17].…”
Section: A Related Workmentioning
confidence: 99%
“…Compared to (6), the new formulation in (7) has a new log term that is intentionally added to reflect humans' preference/desire. The new formulation allows us to learn a nonlinear objective function based on human preferences because human preferences among trajectory pairs can shape the loss towards the desired regions and away from the undesired regions.…”
Section: Training a Maximum Entropy Reinforcement Learning Algorithmmentioning
confidence: 99%
“…learning (see, e.g., [3], [4]) focuses on learning reward functions directly from human demonstrations. Human preferencebased learning (see, e.g., [5], [6], [7]) focuses on maximizing the volume removed from the distribution of the weight vector by asking a human to pick between trajectory pairs until reaching convergence. While both approaches provide important advances, they still lack much-needed data efficiency.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…In this case, the good IRL methods need to have more expressive power and a more efficient framework. Research such as [9,15,21,30,41,50,54] was proposed to alleviate the above problems by using more expressive models like neural network and optimizing the input like ranking the demonstration in advance.…”
mentioning
confidence: 99%