Linear Quadratic Control Using Model-Free Reinforcement Learning

Yaghmaie, Farnaz Adib; Gustafsson, Fredrik; Ljung, Lennart

doi:10.1109/tac.2022.3145632

Cited by 22 publications

(16 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to better explain Algorithm 2, we can divide it into the following three steps: First of all, prove that Ĝi can be uniquely determined in each iteration. Lemma 5 explains that the solutions obtained from the Bellman Equation ( 26) based on the average off-policy Q-learning method are equivalent to those obtained from the model-based Bellman Equation (18). Equation (46) of Algorithm 2 is derived from the transformation of Equation ( 26), and we show that the procedure in Algorithm 2 guarantees the existence of the solution Ĝi .…”

Section: Lemma 4 ([30]mentioning

confidence: 86%

“…By utilizing an average cost function, Reference [17] tackled the output regulation problem for linear systems with unknown dynamics. Data-driven average Q-learning algorithms can also handle linear quadratic control problems, especially when there are un-measurable stochastic disturbances [18,19].…”

Section: Introductionmentioning

confidence: 99%

“…However, that paper [25] did not consider external noises in the system. In reference [18], the authors considered the case of external noises to address the optimal control problem (OCP), but the tracking control problem was not taken into account. In reference [26], the Q-learning method was employed to address the LQT problem in the presence of partially unknown system dynamics.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method

Hao,

Wang,

Shi

2024

Mathematics

View full text Add to dashboard Cite

This article investigates the optimal tracking control problem for data-based stochastic discrete-time linear systems. An average off-policy Q-learning algorithm is proposed to solve the optimal control problem with random disturbances. Compared with the existing off-policy reinforcement learning (RL) algorithm, the proposed average off-policy Q-learning algorithm avoids the assumption of an initial stability control. First, a pole placement strategy is used to design an initial stable control for systems with unknown dynamics. Second, the initial stable control is used to design a data-based average off-policy Q-learning algorithm. Then, this algorithm is used to solve the stochastic linear quadratic tracking (LQT) problem, and a convergence proof of the algorithm is provided. Finally, numerical examples show that this algorithm outperforms other algorithms in a simulation.

show abstract

Section: Lemma 4 ([30]mentioning

confidence: 86%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method

Hao,

Wang,

Shi

2024

Mathematics

View full text Add to dashboard Cite

show abstract

“…An iterative Linear Quadratic Regulator (iLQR) optimal control technique based on the dynamic modeling of the quadrotors was developed to achieve the leader-follower formation (Jasim and Gu 2019). Among the array of control methodologies discussed earlier, Reinforcement Learning (RL)-based control approaches offer an end-to-end solution for motion control (Yaghmaie et al 2023) and obstacle avoidance (Sadhukhan and Selmic 2021) during formation.…”

Section: Introductionmentioning

confidence: 99%

A deep learning optimized LQR method for enhanced formation control with embedded systems

Wang,

Ling,

2024

Eng. Res. Express

View full text Add to dashboard Cite

To achieve higher accuracy throughout the formation control processes and enhance precision in dynamic environments, particularly for the formation control of follower vehicles with embedded systems, this paper proposes a method and framework for vehicle formation control. An Ackermann-model based LQR controller is developed for lateral distance control and a PD controller for longitudinal distance control. To enhance the efficacy of the LQR controller, the Deep Deterministic Policy Gradient Derivative (DDPG) method is introduced into the control system. The DDPG networks are trained in a simulation environment and can subsequently predict LQR parameters in real-time experiments. The practical application of the methodology is showcased, and the concluding remarks emphasize the potential and superior performance of our proposed formation control approach by experimental comparison with other controllers. This methodology can be implemented in small vehicles that possess limited computational resources and is also suitable for scenarios requiring dynamic motion control with higher tracking accuracy and stability.

show abstract

“…However, adding a terminal constraints set increases computational demand at each step. Thus, we proposed a LQG-based approach (Yaghmaie et al, 2022) to reduce the disturbance and increase system stability.…”

Section: Introductionmentioning

confidence: 99%

Trajectory tracking for quadrotors: An optimization‐based planning followed by controlling approach

Kulathunga

Devitt

Klimchik

2022

Journal of Field Robotics

View full text Add to dashboard Cite

We present an optimization-based reference trajectory tracking method for quadrotor robots for slow-speed maneuvers. The proposed method uses planning followed by the controlling paradigm. The basic concept of the proposed method is an analogy with linear quadratic Gaussian in which nonlinear model predictive control (NMPC) is employed for predicting optimal control policy in each iteration. Multiple-shooting is suggested over direct-collocation for imposing constraints when modeling the NMPC. Incremental Euclidean distance transformation map is constructed for obtaining the closest free distances relative to the predicted trajectory; these distances are considered obstacle constraints. The reference trajectory is generated ensuring dynamic feasibility. The objective is to minimize the error between the quadrotor's current pose and the desired reference trajectory pose in each iteration. Finally, we compared the proposed method with two other approaches and showed that the proposed method outperforms the said approaches in terms of reaching the goal without any collision. Additionally, we published a new data set that can be used for evaluating the performance of trajectory tracking algorithms.

show abstract

Linear Quadratic Control Using Model-Free Reinforcement Learning

Cited by 22 publications

References 35 publications

Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method

Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method

A deep learning optimized LQR method for enhanced formation control with embedded systems

Trajectory tracking for quadrotors: An optimization‐based planning followed by controlling approach

Contact Info

Product

Resources

About