2022
DOI: 10.1109/tetci.2022.3140375
|View full text |Cite
|
Sign up to set email alerts
|

Optimal Actor-Critic Policy With Optimized Training Datasets

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 18 publications
0
4
0
Order By: Relevance
“…Finally, we penalise actions that lead to the structures that have energies that we have seen already, which is to promote general exploration of the potential energy surface. Although the last penalty is more general than the one before, it is practically useful to control these penalties independently, e.g., by removing one completely, or by imposing different penalty levels for zero reward cases where the action does not alter the structure and non-unique energy structures where the action does not promote exploration [25].…”
Section: Reinforcement Learning In Basin-hopping Cspmentioning
confidence: 99%
“…Finally, we penalise actions that lead to the structures that have energies that we have seen already, which is to promote general exploration of the potential energy surface. Although the last penalty is more general than the one before, it is practically useful to control these penalties independently, e.g., by removing one completely, or by imposing different penalty levels for zero reward cases where the action does not alter the structure and non-unique energy structures where the action does not promote exploration [25].…”
Section: Reinforcement Learning In Basin-hopping Cspmentioning
confidence: 99%
“…Through trial-and-error interactions with the environment, Reinforcement Learning (RL) offers a promising approach to solving decision-making and optimization problems. Over the past few years, RL has accomplished impressive feats in handling difficult tasks, in such domains as autonomous driving [119,16], locomotion control [99,129], robotics [71,94], continuous control [5,6,7], and multi-agent systems and control [39,15]. A majority of these successful approaches are purely data-driven and leverage trial-and-error to freely explore the search space.…”
Section: Introductionmentioning
confidence: 99%
“…Off-policy algorithms necessitate a substantial number of samples in the ER buffer to facilitate meaningful policy learning. Researchers have explored various exploration strategies to enhance sampling efficiency, such as action space perturbation employed in DDPG and TD3, as well as policy parameter perturbation in [10], [11]. While these strategies offer clear advantages, their effectiveness tends to be less consistent across diverse environmental settings, particularly in high-dimensional environments.…”
Section: Introductionmentioning
confidence: 99%