Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction 2015
DOI: 10.1145/2696454.2696455
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Model Learning from Joint-Action Demonstrations for Human-Robot Collaborative Tasks

Abstract: We present a framework for automatically learning human user models from joint-action demonstrations that enables a robot to compute a robust policy for a collaborative task with a human. First, the demonstrated action sequences are clustered into different human types using an unsupervised learning algorithm. A reward function is then learned for each type through the employment of an inverse reinforcement learning algorithm. The learned model is then incorporated into a mixed-observability Markov decision pr… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

3
122
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 138 publications
(125 citation statements)
references
References 24 publications
3
122
0
Order By: Relevance
“…First, we consider a regular neural network f N N , which is our baseline. Second, we include the clustering-based method, denoted f N N (i) , by Nikolaidis et al [3] in which we cluster the gameplay data into three partitions using k-means and learn a separate policy network on each. Third, we consider a BNN, f BN N , which is able to holistically reason about the homo-and heterogeneity amongst the demonstrators.…”
Section: A Effects Of Utilizing Heterogeneitymentioning
confidence: 99%
See 1 more Smart Citation
“…First, we consider a regular neural network f N N , which is our baseline. Second, we include the clustering-based method, denoted f N N (i) , by Nikolaidis et al [3] in which we cluster the gameplay data into three partitions using k-means and learn a separate policy network on each. Third, we consider a BNN, f BN N , which is able to holistically reason about the homo-and heterogeneity amongst the demonstrators.…”
Section: A Effects Of Utilizing Heterogeneitymentioning
confidence: 99%
“…[5] showed that when attempting LfD from pilots' demonstrations executing a single flight plan, averaging trajectories led to worse performance than using a single trajectory. Nikolaidis et al [3] approached this issue by categorizing demonstrators according to their task execution preference by clustering and learning a separate policy for each cluster. While this allows for utilization of the entire dataset, each policy learns off a fraction of the data.…”
Section: Introductionmentioning
confidence: 99%
“…Here the intention is also assumed to be subject to a latent dynamics. In the work of Nikolaidis et al (2015), the robot learned to cooperate a painting task by holding and adjusting the pose of a cube according to the preferred sequence of human collaborators. Similar to our work, it employed inverse reinforcement learning over classified demonstrations with different styles.…”
Section: Related Workmentioning
confidence: 99%
“…Similar to our work, it employed inverse reinforcement learning over classified demonstrations with different styles. Our approach is distinct by using the learned ensemble as the mode observational model, which was user-specific in Nikolaidis et al (2015). Moreover, our system directly learns with continuous demonstration data, while Nikolaidis et al (2015) resorted to state discretization and the adopted mixed-observation Markov Decision Process is limited to low dimensional tasks.…”
Section: Related Workmentioning
confidence: 99%
“…Optimal policies for robots are then computed by forming a belief over human intention. In slightly different instances of applications of MOMDPs in HRI [22], the class of human subjects is considered as the unobservable variable. While it is true that in general MOMDPs drastically decrease the complexity of the problem, they usually require fully known submodels for each one of the unobservable variables.…”
Section: Introductionmentioning
confidence: 99%