2019 IEEE 2nd International Conference on Computer and Communication Engineering Technology (CCET) 2019
DOI: 10.1109/ccet48361.2019.8989177
|View full text |Cite
|
Sign up to set email alerts
|

Modeling a Continuous Locomotion Behavior of an Intelligent Agent Using Deep Reinforcement Technique

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 1 publication
0
6
0
Order By: Relevance
“…This paper mainly studies the performance of LSTM [33,34,35,36,37] and its improved network [38,39] in text value classification. The input of the LSTM network was a word vector, the collected movie comment data was pre-processed to get the word vector in the dataset.…”
Section: Methodsmentioning
confidence: 99%
“…This paper mainly studies the performance of LSTM [33,34,35,36,37] and its improved network [38,39] in text value classification. The input of the LSTM network was a word vector, the collected movie comment data was pre-processed to get the word vector in the dataset.…”
Section: Methodsmentioning
confidence: 99%
“…Of the families of RL algorithms, we used the policy gradient method, which optimizes the performance of the expected cumulative reward by finding a good parametrized neural network policy. The chosen algorithm is the Tower crane-based 3D printing twin-delayed deep deterministic policy gradient (TD3; Grondman et al, 2012;Konda and Tsitsiklis, 1999), an RL method suitable for models characterized by continuous action spaces (Dankwa and Zheng, 2019). The TD3 is an actor-critic architecture that consists of two parts: an actor and a critic.…”
Section: Agent Modelling and Learning Algorithmmentioning
confidence: 99%
“…RL algorithms characterized as off‐policy generally utilized a separate behaviour policy which is independent of the policy which is being improved upon. The key advantage of the separation is that the behaviour policy can operate by sampling all actions, while the estimation policy can be deterministic [61]. TD3 was built on the DDPG algorithm to increase stability and performance with consideration of function approximation error [60].…”
Section: Dynamic Power Allocation With Drlmentioning
confidence: 99%
“…In order to reduce overestimation bias problems, the authors in [60] extended DDPG to twin delayed deep deterministic policy gradient algorithm (TD3), which estimates the target Q value by using the minimum of two target Q value, called clipped double Q learning. DPG and DDPG algorithms paved way to TD3 algorithms by the successful works on DQN [60, 61]. TD3 adopts two critics to get a less optimistic estimate of an action value by taking the minimum between two estimates.…”
Section: Related Workmentioning
confidence: 99%