2018
DOI: 10.1109/lra.2018.2801468
|View full text |Cite
|
Sign up to set email alerts
|

Failure is Not an Option: Policy Learning for Adaptive Recovery in Space Operations

Abstract: This paper considers the problem of how robots in long-term space operations can learn to choose appropriate sources of assistance to recover from failures. Current assistant selection methods for failure handling are based on manually specified static look up tables or policies, which are not responsive to dynamic environments or uncertainty in human performance. We describe a novel and highly flexible learningbased assistant selection framework that uses contextual multiarm bandit algorithms. The contextual … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 24 publications
0
7
0
Order By: Relevance
“…Two matched sets of simulations were conducted. In each set, fully disjoint teams of actors were compared against shared teams selected from a set of ten actors, producing regret curves similar to our previously published work [1]. The number of selection events needed before the multi-arm bandit outperformed the informed static policy was then measured for one hundred trials in each setup.…”
Section: Experimental Designs and Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…Two matched sets of simulations were conducted. In each set, fully disjoint teams of actors were compared against shared teams selected from a set of ten actors, producing regret curves similar to our previously published work [1]. The number of selection events needed before the multi-arm bandit outperformed the informed static policy was then measured for one hundred trials in each setup.…”
Section: Experimental Designs and Resultsmentioning
confidence: 99%
“…b) Apparatus and Environment: A key challenge for this experimental design was ensuring that subjects completed a sufficient number of subtasks, so that our assistant selection algorithm had enough data to learn from. Based on our prior simulations [1], we found that the number of selection events required to achieve parity with a state-of-the-art approach scales linearly in both the size of the assistant population and the number of types of subtask. Therefore, the human data collection involved a small number of subtasks and used small assistant populations, to validate performance of our adaptive approach within the horizon imposed by the number of selection events possible in a single testing session.…”
Section: B Development Of a Reference Human Performance Datasetmentioning
confidence: 92%
See 3 more Smart Citations