2020
DOI: 10.1109/tro.2020.2971415
|View full text |Cite
|
Sign up to set email alerts
|

Quantifying Hypothesis Space Misspecification in Learning From Human–Robot Demonstrations and Physical Corrections

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
47
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 36 publications
(47 citation statements)
references
References 33 publications
0
47
0
Order By: Relevance
“…Typically, is fixed, recovering the Maximum Entropy IRL [30] observation model. Inspired by work in [5,12,13], we instead reinterpret it as a confidence in the robot's features' ability to explain human data. To detect missing features, we estimateˆvia a Bayesian belief update ′ ( , ) ∝ ( | , ) ( , ).…”
Section: Confidence Estimationmentioning
confidence: 99%
See 1 more Smart Citation
“…Typically, is fixed, recovering the Maximum Entropy IRL [30] observation model. Inspired by work in [5,12,13], we instead reinterpret it as a confidence in the robot's features' ability to explain human data. To detect missing features, we estimateˆvia a Bayesian belief update ′ ( , ) ∝ ( | , ) ( , ).…”
Section: Confidence Estimationmentioning
confidence: 99%
“…Unfortunately, this puts too much burden on system designers: specifying a priori an exhaustive set of all the features that end-users might care about is impossible for real-world tasks. While prior work has enabled robots to at least detect that the features it has access to cannot explain the human's input [5], it is still unclear how the robot might then construct a feature that can explain it. A natural answer is in deep IRL methods [11,20,29], which learn rewards defined directly on the high-dimensional raw state (or observation) space, thereby constructing features automatically.…”
Section: Introductionmentioning
confidence: 99%
“…slow down the motion), which is not a predefined preference that a robot would take into account. The robot needs to be aware of its lack of ability to explain the human's intention by having less confidence, and then the reason about how to behave to meet people's requirements, which is potentially solved in [162,163].…”
Section: Incremental Learning From Correctionmentioning
confidence: 99%
“…The human provides a sequence of physical corrections to guide the robot toward their preferred objective, i.e., placing the bag on the green region while also avoiding the obstacles on the left, and holding the bag upright without squeezing or stretching it. [4]. These works assume that the human makes corrections based only on their objective, without considering the other corrections they have already made or are planning to provide.…”
Section: Introductionmentioning
confidence: 99%
“…Learning from Corrections (Online). Recent research recognizes that physical human corrections are often intentional, and therefore informative [1]- [4]. These works learn about the human's underlying objective in real-time by comparing the current correction to the robot's previous behavior.…”
Section: Introductionmentioning
confidence: 99%