2021
DOI: 10.17485/ijst/v14i30.1030
|View full text |Cite
|
Sign up to set email alerts
|

Control and Simulation of a 6-DOF Biped Robot based on Twin Delayed Deep Deterministic Policy Gradient Algorithm

Abstract: Objectives:To study an algorithm to control a bipedal robot to walk so that it has a gait close to that of a human. It is known that the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is a highly efficient algorithm with a few changes compared to the popular algorithm -the commonly used Deep Deterministic Policy Gradient (DDPG) in the continuous action space problem in Reinforcement Learning. Methods: Different from the usual sparse reward function model used, in this study, a reward model com… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 24 publications
0
6
0
Order By: Relevance
“…To address this issue, M et al combined TD3 with Hindsight Experience Replay (HER) [165]. Khoi et al utilized the TD3 algorithm along with a novel reward model to simulate the gait of a 6-DOF biped robot in a Gazebo/ROS environment [166]. Yang and Xu aimed to design a robot that can aid in warehouse object grasping using various DRL algorithms, including TD3 [167].…”
Section: Twin Delayed Deep Deterministic Policy Gradientmentioning
confidence: 99%
“…To address this issue, M et al combined TD3 with Hindsight Experience Replay (HER) [165]. Khoi et al utilized the TD3 algorithm along with a novel reward model to simulate the gait of a 6-DOF biped robot in a Gazebo/ROS environment [166]. Yang and Xu aimed to design a robot that can aid in warehouse object grasping using various DRL algorithms, including TD3 [167].…”
Section: Twin Delayed Deep Deterministic Policy Gradientmentioning
confidence: 99%
“…P.B. Khoi et al presented Twin Delayed DDPG (TD3) 1) to control the bipedal walking robot. TD3 can handle the overestimation bias on DDPG, but TD3 is still using the deterministic policy.…”
Section: Introductionmentioning
confidence: 99%
“…In this research, an extension of the TD3 [20] algorithm was proposed to include more information about the connection between the joints of the robot in the training process. In fact, there are many articles [20][21][22][23][24][25] using reinforcement learning algorithms such as TD3, DDPG and SAC to find the desired angle values of the joints of the robot. However, their algorithms only used the information about the velocity and angular value of the joints for training, they did not take advantage of the graph topology and the binding relationship of the humanoid robot, as in our method.…”
Section: Introductionmentioning
confidence: 99%
“…In each state, the height of the body robot has different values. In the paper [25], only an average value of the body height during motion is used as a basis height for the robot to learn, two average values of the body height corresponding to two grounding states of the legs in motion are used in this paper. At the single phase of walking (Figure 7a,b), the average height of the robot's body reaches a higher value than that at the double phase of walking (Figure 7c,d).…”
mentioning
confidence: 99%