2012 American Control Conference (ACC) 2012
DOI: 10.1109/acc.2012.6315022
|View full text |Cite
|
Sign up to set email alerts
|

Model-Free reinforcement learning with continuous action in practice

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
132
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 198 publications
(132 citation statements)
references
References 15 publications
0
132
0
Order By: Relevance
“…Many machine learning approaches base their optimization on variations of policy gradients (see for instance Degris et al (2012); Watkins and Dayan (1992)). We use the simplest policy gradient formulation, corresponding to the seminal work by Sutton et al (1999).…”
Section: Policy Gradient Methods On Loss Function For Max-cut Problemmentioning
confidence: 99%
“…Many machine learning approaches base their optimization on variations of policy gradients (see for instance Degris et al (2012); Watkins and Dayan (1992)). We use the simplest policy gradient formulation, corresponding to the seminal work by Sutton et al (1999).…”
Section: Policy Gradient Methods On Loss Function For Max-cut Problemmentioning
confidence: 99%
“…In an off-policy setting, actor-critic estimates the value function of π θ (a|s) by averaging the state distribution of behavior policy β(a|s) [11]. Instead of considering the stochastic policy π θ (s |a), the deterministic policy gradient (DPG) theorem [32] proves that policy gradient framework can be extended to find deterministic off-policy µ θ (s), which is given as follows:…”
Section: Supervisor Of Clinician Decisionmentioning
confidence: 99%
“…We begin by examining the choice of employing deep reinforcement learning against simpler reinforcement learning algorithms for solving the problem formulated in §IV-B. We compare the performance of Iris when using DDPG against the stochastic policy gradient algorithm of [82], which employs linear function approximators for the actor and the critic (Lin-PG). As illustrated in Fig.…”
Section: B Deep Learning Benefits Feasibility and Scalabilitymentioning
confidence: 99%