2012
DOI: 10.1109/tpwrs.2011.2166091
|View full text |Cite
|
Sign up to set email alerts
|

Comparing Policy Gradient and Value Function Based Reinforcement Learning Methods in Simulated Electrical Power Trade

Abstract: This version is available at https://strathprints.strath.ac.uk/33071/ Strathprints is designed to allow users to access the research output of the University of Strathclyde. Unless otherwise explicitly stated on the manuscript, Copyright © and Moral Rights for the papers on this site are retained by the individual authors and/or other copyright owners. Please check the manuscript for details of any other licences that may have been applied. You may not engage in further distribution of the material for any pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(6 citation statements)
references
References 34 publications
0
6
0
Order By: Relevance
“…In addition, small changes in the value function will cause huge changes to the policy, thereby affecting convergence. In policy‐based methods, see, for example, the policy gradient method, 41 it directly learns the policy by parameterizing the policy. It is suitable for high‐dimensional continuous action space and stochastic policy, but the disadvantage is that the variance of the gradient estimation is relatively high, and it is easy to converge to the non‐optimal policy.…”
Section: A Power System Model With the Dynamic Event‐triggered Schemementioning
confidence: 99%
“…In addition, small changes in the value function will cause huge changes to the policy, thereby affecting convergence. In policy‐based methods, see, for example, the policy gradient method, 41 it directly learns the policy by parameterizing the policy. It is suitable for high‐dimensional continuous action space and stochastic policy, but the disadvantage is that the variance of the gradient estimation is relatively high, and it is easy to converge to the non‐optimal policy.…”
Section: A Power System Model With the Dynamic Event‐triggered Schemementioning
confidence: 99%
“…Since ( , ) is also unknown, it may also be estimated or better represented by an N-step return, i.e., the total expected discounted reward for N stages (say taking 5 future states). This is possible because the absolute value of ( , ) is not needed but how much better it is than the current policy [79]. Using (14), the ANN can be trained to adjust its parameters in the direction of better policy performance using the gradient ascent method.…”
Section: = ( +1 ) +mentioning
confidence: 99%
“…The policy gradient reinforcement learning implemented by artificial neural network for the simulated electric power trade shows the result of maximizing the performance [23]. In addition, a study that reduced CPU power consumption in mobile environment by learning through artificial neural network was introduced [24] and showed the possibility of OS optimization through machine learning.…”
Section: Related Workmentioning
confidence: 99%