2018
DOI: 10.1007/978-981-13-2375-1_44
|View full text |Cite
|
Sign up to set email alerts
|

Research on Motion Planning of Seven Degree of Freedom Manipulator Based on DDPG

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 3 publications
0
5
0
Order By: Relevance
“…where S is the set of states of the agent and the environment, A is the set of actions executed by the agent, P is the model of the system-in other words, it is the transition probability of a state-R is the reward function, and γ is a discount factor [10]. The DRL objective function has two forms: the first is a value function that defines the expectation of the accumulated reward.…”
Section: Deep Reinforcement Learningmentioning
confidence: 99%
“…where S is the set of states of the agent and the environment, A is the set of actions executed by the agent, P is the model of the system-in other words, it is the transition probability of a state-R is the reward function, and γ is a discount factor [10]. The DRL objective function has two forms: the first is a value function that defines the expectation of the accumulated reward.…”
Section: Deep Reinforcement Learningmentioning
confidence: 99%
“…In those models, machine learning can help forecast solar energy output [14]. Otherwise, the authors combined LSTM with CNN, wavelet packet decomposition (WPD), wavelet transform (WT), and other methods, and combined the particle swarm algorithm (PSO) with the adaptive neuro-fuzzy inference system (ANFIS) to improve the performance, stability, and reliability of model extraction data features [15][16][17][18]. e authors applied the optimal frequency domain decomposition method to deep learning and used correlation to obtain the optimal frequency cutoff points of the decomposition components [19].…”
Section: Related Workmentioning
confidence: 99%
“…After storing enough experience in the replay buffer, the optimal strategy is learned by random sampling in small batches. The update of the critic online network is updated by (21), and the action network is updated by (22). After each training step, the target critic network and the target action network are slowly updated by (23) and (24).…”
Section: Thickness and Tension Control Frameworkmentioning
confidence: 99%
“…In recent years, Deep Reinforcement Learning (DRL) has attracted wide attention in solving high-dimensional control with high complexity [19]. DDPG is one of the model-free DRL methods for continuous action spaces and has been widely applied in many fields, such as robotic control [20], manipulator control [21] and wireless sensors [22].…”
Section: Introductionmentioning
confidence: 99%