2022
DOI: 10.1007/978-3-031-13841-6_41
|View full text |Cite
|
Sign up to set email alerts
|

An Improved Off-Policy Actor-Critic Algorithm with Historical Behaviors Reusing for Robotic Control

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 7 publications
0
1
0
Order By: Relevance
“…The goal of RL is to learn strategies to maximize expectations in the Markov decision-making process (Zhang et al 2022) [32]. Markov process consists of a quintuple 𝐴 𝜋 𝑡 = {𝑆, 𝐴, 𝑃, 𝑅, 𝛾}, where 𝑆, 𝐴 represent the state space and action space, 𝑃 represents the transition probability between different states, and 𝑅 represents the reward set.…”
Section: A Introduction To the Cql Algorithmmentioning
confidence: 99%
“…The goal of RL is to learn strategies to maximize expectations in the Markov decision-making process (Zhang et al 2022) [32]. Markov process consists of a quintuple 𝐴 𝜋 𝑡 = {𝑆, 𝐴, 𝑃, 𝑅, 𝛾}, where 𝑆, 𝐴 represent the state space and action space, 𝑃 represents the transition probability between different states, and 𝑅 represents the reward set.…”
Section: A Introduction To the Cql Algorithmmentioning
confidence: 99%