2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) 2020
DOI: 10.1109/ro-man47096.2020.9223338
|View full text |Cite
|
Sign up to set email alerts
|

Integrating an Observer in Interactive Reinforcement Learning to Learn Legible Trajectories

Abstract: An important aspect of Human-Robotcooperation is that the robot is capable of clearly communicating its intentions to its human collaborator. This communication of intentions often requires the generation of legible motion trajectories. The concept of legible motion is usually not studied together with machine learning. Studying these fields together is an important step towards better Human-Robot cooperation. In this paper, we investigate interactive robot learning approaches with the aim of developing models… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

2
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(7 citation statements)
references
References 36 publications
2
5
0
Order By: Relevance
“…Furthermore, we noticed that our results are obtained with a simple observer model having uniformly distributed probabilities, and that considers the same policies that are available to the agent. Because our results are qualitative similar to those reported in earlier works on legibility [11,26] ie. legible trajectories are skewed to avoid other goal locations, it is suggested that a similarly uniformly initialized observer model was implicitly utilized in those papers as well.…”
Section: Discussionsupporting
confidence: 92%
See 1 more Smart Citation
“…Furthermore, we noticed that our results are obtained with a simple observer model having uniformly distributed probabilities, and that considers the same policies that are available to the agent. Because our results are qualitative similar to those reported in earlier works on legibility [11,26] ie. legible trajectories are skewed to avoid other goal locations, it is suggested that a similarly uniformly initialized observer model was implicitly utilized in those papers as well.…”
Section: Discussionsupporting
confidence: 92%
“…To the best of our knowledge very little work exists in RL relating to interpretable behavior as we just described. Both [26,27] propose methods relying on a transposition of the original formulation of legibility. The methods result applicable only for goal-driven policies, thus excluding all other types of policies available in various RL frameworks.…”
Section: Introductionmentioning
confidence: 99%
“…Furthermore, we noticed that our results are obtained with a simple observer model that considers the same policies that are available to the agent but with a uniform probability. Because our results are qualitative similar to those reported in earlier works on legibility [11,27,28] i.e., legible trajectories are skewed to avoid other goal locations, is suggested that a similarly uniformly initialized observer model was implicitly utilized in those papers as well. However, we regularize the agent by working on policies rather than reward distributions, and for this reason the proposed method has three major advantages: firstly it easily generalizes over different shapes of reward regions and not only only on goal states that are a particular type of reward region.…”
Section: Discussionsupporting
confidence: 88%
“…Notice how this type of reward regions simulates goal locations, and thus allows to qualitatively confront the here obtained legible behavior with those for goal-driven policies from literature [27,28]. The behaviors are quite similar, with trajectories that are arced to disambiguate the goals.…”
Section: Qualitative Evaluationmentioning
confidence: 71%
See 1 more Smart Citation