2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2021
DOI: 10.1109/iros51168.2021.9636020
|View full text |Cite
|
Sign up to set email alerts
|

Self-Supervised Online Reward Shaping in Sparse-Reward Environments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 23 publications
(6 citation statements)
references
References 9 publications
0
5
0
Order By: Relevance
“…Nevertheless, the reduced-order models and learning algorithms used in this study have potential improvements that can be made in future research, including (1) collecting more quantitative data on storm petrel pattering/seaanchoring to validate modeling approaches, (2) exploring new learning techniques, such as reward shaping [37,38], to enhance the interoperability of the AI 'black box' , and (3) using empirical data to measure the drag coefficient of objects that replicate a storm petrel's anatomy by fluid dynamic experiments to build a higher fidelity force model [39][40][41].…”
Section: Discussionmentioning
confidence: 99%
“…Nevertheless, the reduced-order models and learning algorithms used in this study have potential improvements that can be made in future research, including (1) collecting more quantitative data on storm petrel pattering/seaanchoring to validate modeling approaches, (2) exploring new learning techniques, such as reward shaping [37,38], to enhance the interoperability of the AI 'black box' , and (3) using empirical data to measure the drag coefficient of objects that replicate a storm petrel's anatomy by fluid dynamic experiments to build a higher fidelity force model [39][40][41].…”
Section: Discussionmentioning
confidence: 99%
“…Likewise, each reward component should be weighted optimally in a multi-objective DRL agent to achieve the desired outcome and faster convergence. Few studies [27], [28] use supervised reward shaping techniques to assist the sparse rewards setup, which is a different approach from ours, as we focus on predicting optimal weights for each reward component according to the current contextual information.…”
Section: Related Workmentioning
confidence: 99%
“…However, such an approach can easily exploit poorly designed rewards, get stuck in local optima, and induce behavior that the designer did not intend. In contrast, goal-based sparse rewards are appealing because they do not suffer from the reward exploration problem (24). In addition, this small, simple set of rules has similarities with biological behaviors and is therefore applicable to animals with a very limited level of information processing (51).…”
Section: Reward Functionmentioning
confidence: 99%
“…To enhance the target-tracking capabilities of marine robots and our understanding of the ecosystem, a guidance system based on soft actor-critic (SAC) (23) deep RL algorithms has been developed. Whereas most of the attention in deep RL has focused on game theory [for example, to solve Atari games (24) or to master the game of Go (25)], the same principles can be used to solve path planning and trajectory optimization problems (Table 1). Previous studies have shown that aerial gliders can navigate atmospheric thermals autonomously (26), stratospheric Loon superpressure balloons have learned optimal control to maintain their position at multiple locations (27), and an RL agent has been trained to efficiently navigate in simulated vortical flow fields (28).…”
Section: Introductionmentioning
confidence: 99%