2023
DOI: 10.1146/annurev-control-042920-020021
|View full text |Cite
|
Sign up to set email alerts
|

Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies

Abstract: Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis that has been popularized by successes of reinforcement learning. We take an interdisciplinary perspective in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
6
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 25 publications
(6 citation statements)
references
References 128 publications
0
6
0
Order By: Relevance
“…Due to the above reasons, reinforcement learning based on value optimization cannot be adapted to all scenarios, so the methods based on policy optimization (PO) are born [111]. Policy optimization reinforcement learning methods are usually divided by the determinism of the policy, where the deterministic policy approach will only choose a fixed action at each step and then enter the next state; furthermore, the policy of the stochastic policy approach is a probability distribution of an action (one out of several actions), and the agent will choose one according to the current state with a certain policy and then enter the next state.…”
Section: Optimization Of Policy-based Reinforcement Learningmentioning
confidence: 99%
“…Due to the above reasons, reinforcement learning based on value optimization cannot be adapted to all scenarios, so the methods based on policy optimization (PO) are born [111]. Policy optimization reinforcement learning methods are usually divided by the determinism of the policy, where the deterministic policy approach will only choose a fixed action at each step and then enter the next state; furthermore, the policy of the stochastic policy approach is a probability distribution of an action (one out of several actions), and the agent will choose one according to the current state with a certain policy and then enter the next state.…”
Section: Optimization Of Policy-based Reinforcement Learningmentioning
confidence: 99%
“…Thrust 1 enables Thrust 2, which aims to build a diversity of dynamic models directly from sensor data (Chen et al 2022;Fasel et al 2022;Gao and Kutz 2022). With the development of dynamic models, datadriven control protocols are developed in Thrust 3 (Hu et al 2023). The understanding of models and controls then re-integrates with sensing where better sensing strategies can be constructed with knowledge from Thrusts 2 and 3.…”
mentioning
confidence: 99%
“…This requires new perspectives and computational schemes to address the emerging challenges posed for optimization in the context of complex dynamical systems. Thus, the institute has leveraged the structure of dynamic data to design guaranteed optimization and sensing strategies for safe, efficient, robust real-time decision making (Hu et al 2023;Manohar et al 2018).…”
mentioning
confidence: 99%
See 2 more Smart Citations