2023
DOI: 10.1007/s10514-023-10120-w
|View full text |Cite
|
Sign up to set email alerts
|

Learning rewards from exploratory demonstrations using probabilistic temporal ranking

Abstract: Informative path-planning is a well established approach to visual-servoing and active viewpoint selection in robotics, but typically assumes that a suitable cost function or goal state is known. This work considers the inverse problem, where the goal of the task is unknown, and a reward function needs to be inferred from exploratory example demonstrations provided by a demonstrator, for use in a downstream informative path-planning policy. Unfortunately, many existing reward inference strategies are unsuited … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2024
2024
2025
2025

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 36 publications
0
7
0
Order By: Relevance
“…Due to the acoustic shadows, poor contrast, speckle noise, and potential deformation in resulting images (Mishra et al, 2018), guiding a probe to the standard US planes is sophisticated, even for senior sonographers. This means the experts’ demonstrations of searching for standard US planes will be sub-optimal and even contradictory (Burke et al, 2023). Therefore, the popular maximum-entropy IRL method (Aghasadeghi and Bretl, 2011) cannot be directly applied in our applications.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Due to the acoustic shadows, poor contrast, speckle noise, and potential deformation in resulting images (Mishra et al, 2018), guiding a probe to the standard US planes is sophisticated, even for senior sonographers. This means the experts’ demonstrations of searching for standard US planes will be sub-optimal and even contradictory (Burke et al, 2023). Therefore, the popular maximum-entropy IRL method (Aghasadeghi and Bretl, 2011) cannot be directly applied in our applications.…”
Section: Related Workmentioning
confidence: 99%
“…In order to achieve good performance, it has strict requirements for the initial position and phantom position. Besides, Burke et al introduced a probabilistic temporal ranking model which assumes that the images shown in the later stage are more important than the earlier images (Burke et al, 2023), allowing for reward inference from sub-optimal scanning demonstrations. They use this model to coarsely navigate a US probe to a mimicked tumor inside of a gel phantom, followed by an exploratory Bayesian optimization policy to search for scanning positions that capture images with high rewards.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations