2021
DOI: 10.1109/tcyb.2019.2949596
|View full text |Cite
|
Sign up to set email alerts
|

Task-Oriented Deep Reinforcement Learning for Robotic Skill Acquisition and Control

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 37 publications
(12 citation statements)
references
References 23 publications
0
12
0
Order By: Relevance
“…Related Work in Entropy Regularization: Some of the previous works in RL literature [14]- [20], [30], [31] either use entropy as a regularization term (− log µ(a t |s t )) [14], [15] to the instantaneous cost function c(s t , a t , s t+1 ) or maximize the entropy (− a µ(a|s) log µ(a|s)) [16]- [18] associated only to the stochastic policy under constraints on the cost J. This results into benefits such as better exploration, overcoming the effect of noise w t in the instantaneous cost c t and obtaining faster convergence.…”
Section: Preliminariesmentioning
confidence: 99%
“…Related Work in Entropy Regularization: Some of the previous works in RL literature [14]- [20], [30], [31] either use entropy as a regularization term (− log µ(a t |s t )) [14], [15] to the instantaneous cost function c(s t , a t , s t+1 ) or maximize the entropy (− a µ(a|s) log µ(a|s)) [16]- [18] associated only to the stochastic policy under constraints on the cost J. This results into benefits such as better exploration, overcoming the effect of noise w t in the instantaneous cost c t and obtaining faster convergence.…”
Section: Preliminariesmentioning
confidence: 99%
“…According to different types of tasks, task-oriented programming by demonstration has various representation methods. In recent years, task-oriented PBD methods based on reinforcement learning (RL) [20] have received more attention. This method offers a new paradigm for acquiring skill by maximizing the overall reward.…”
Section: B Task-oriented Programming By Demonstrationmentioning
confidence: 99%
“…The control method has the characteristics of good generality, strong adaptability and easy extension. Xiang and Su [10] proposes an effective model-free off-policy actor-critic algorithm by integrating task reward and task-oriented guiding reward, and applies it to the skill acquisition and continuous control of the robot. The agent can explore the environment more consciously so as to realize the sampling efficiency.…”
Section: Introductionmentioning
confidence: 99%