2011
DOI: 10.1007/s10994-011-5235-x
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement learning in feedback control

Abstract: Technical process control is a highly interesting area of application serving a high practical impact. Since classical controller design is, in general, a demanding job, this area constitutes a highly attractive domain for the application of learning approaches-in particular, reinforcement learning (RL) methods. RL provides concepts for learning controllers that, by cleverly exploiting information from interactions with the process, can acquire highquality control behaviour from scratch.This article focuses on… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
105
0
1

Year Published

2012
2012
2022
2022

Publication Types

Select...
4
4
2

Relationship

1
9

Authors

Journals

citations
Cited by 190 publications
(106 citation statements)
references
References 31 publications
0
105
0
1
Order By: Relevance
“…The Augmented random search (ARS) algorithm relies on the augmentations of the basic version of the random search algorithm that is in turn built on accomplished and proven heuristics which is utilized in deep reinforcement learning strategies and techniques [28][29][30][31]. The prime problem of policy search based on augmentation can be formulated and analyzed as being a continuous problem of searching i.e a continuous search problem.…”
Section: Augmented Random Search Algorithm For Quadcoptermentioning
confidence: 99%
“…The Augmented random search (ARS) algorithm relies on the augmentations of the basic version of the random search algorithm that is in turn built on accomplished and proven heuristics which is utilized in deep reinforcement learning strategies and techniques [28][29][30][31]. The prime problem of policy search based on augmentation can be formulated and analyzed as being a continuous problem of searching i.e a continuous search problem.…”
Section: Augmented Random Search Algorithm For Quadcoptermentioning
confidence: 99%
“…Previous work used the derivative of the value-function estimate, which is not guaranteed to have compatible function approximation, and can lead to problems when the value-function is estimated using functions such as rectifiers that are not smooth (Prokhorov and Wunsch, 1997;Hafner and Riedmiller, 2011;Heess et al, 2015;Lillicrap et al, 2015).…”
Section: Deviator-actor-critic (Dac) Modelmentioning
confidence: 99%
“…MLPs output a smooth approximation of training target data. With this in mind, throughout the learning tasks we will apply a smooth reward function, proposed in [10], that can generate state-action value functions easier to approximate by type of function approximator. Having defined a target point s target , a region of width δ is defined around the target where the cost decreases smoothly, from 95% of a base cost down to 0.…”
Section: Learning Tasksmentioning
confidence: 99%