2020
DOI: 10.1007/s10723-020-09512-4
|View full text |Cite
|
Sign up to set email alerts
|

Modeling-Learning-Based Actor-Critic Algorithm with Gaussian Process Approximator

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(4 citation statements)
references
References 31 publications
0
4
0
Order By: Relevance
“…However, the actor-critic method is the one that can help the learning process with fewer sample and computational resources and combines the advantages of both Monte Carlo policy gradient and value-based methods. 30 Various other advanced algorithms have been introduced to overcome the shortcomings of the above-explained algorithms, such as advantage actor-critic (A2C), asynchronous advantage actor-Critic (A3C), double DQN (DDQN), trust region policy optimization (TRPO), and proximal policy optimization (PPO). Still, we have not found the usage of any such algorithms in any of the surveyed papers.…”
Section: Pros and Cons Of Policy-based Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…However, the actor-critic method is the one that can help the learning process with fewer sample and computational resources and combines the advantages of both Monte Carlo policy gradient and value-based methods. 30 Various other advanced algorithms have been introduced to overcome the shortcomings of the above-explained algorithms, such as advantage actor-critic (A2C), asynchronous advantage actor-Critic (A3C), double DQN (DDQN), trust region policy optimization (TRPO), and proximal policy optimization (PPO). Still, we have not found the usage of any such algorithms in any of the surveyed papers.…”
Section: Pros and Cons Of Policy-based Methodsmentioning
confidence: 99%
“…Policy‐based methods such as Monte Carlo can learn stochastic policies rather than deterministic policy, which has proved beneficial in some situations, but it has higher variations in its sample estimations, which slow down the overall training process. However, the actor–critic method is the one that can help the learning process with fewer sample and computational resources and combines the advantages of both Monte Carlo policy gradient and value‐based methods 30 …”
Section: Introductionmentioning
confidence: 99%
“…Ma et al [13] proposed a decision-making framework titled "Plan-Decision-Action" for autonomous vehicles at complex urban intersections. Zhong et al [14] proposed a model-learningbased actor-critic algorithm with the Gaussian process approximator to solve the problems with continuous state and action spaces. Xiong et al [15] used a Hidden Markov model to predict other vehicles' intentions and built a decision-making model for vehicles at intersections.…”
Section: Introductionmentioning
confidence: 99%
“…Reinforcement learning (RL) emphasizes that the agent W learns the best strategy to interacts with the environment, so as to obtain the maximum cumulative reward. RL algorithms include the value-based algorithms [11], [12] and the policy-based algorithms [13], [14]. The classic value function algorithm is the Q-Learning algorithm [15].…”
Section: Introductionmentioning
confidence: 99%