Reinforcement learning (RL) algorithms based on high-dimensional function approximation have achieved tremendous empirical success in large-scale problems with an enormous number of states. However, most analysis of such algorithms gives rise to error bounds that involve either the number of states or the number of features. This paper considers the situation where the function approximation is made either using the kernel method or the two-layer neural network model, in the context of a fitted Q-iteration algorithm with explicit regularization. We establish an Õ(H 3 |A| 1 4 n − 1 4 ) bound for the optimal policy with Hn samples, where H is the length of each episode and |A| is the size of action space. Our analysis hinges on analyzing the L 2 error of the approximated Q-function using n data points. Even though this result still requires a finite-sized action space, the error bound is independent of the dimensionality of the state space.that we can obtain an Õ((1 − γ) −2 (n − 1 2(1+α) + γ K )) bound for the optimal policy with Kn samples where 0 < γ < 1 is the discount factor and K is the number of iterations. This result builds on the assumption
We propose a machine learning enhanced algorithm for solving the optimal landing problem. Using Pontryagin's minimum principle, we derive a two-point boundary value problem for the landing problem. The proposed algorithm uses deep learning to predict the optimal landing time and a space-marching technique to provide good initial guesses for the boundary value problem solver. The performance of the proposed method is studied using the quadrotor example, a reasonably high dimensional and strongly nonlinear system. Drastic improvement in reliability and efficiency is observed.
Collisions are common in many dynamical systems with real applications. They can be formulated as hybrid dynamical systems with discontinuities automatically triggered when states transverse certain manifolds. We present an algorithm for the optimal control problem of such hybrid dynamical systems, based on solving the equations derived from the hybrid minimum principle (HMP). The algorithm is an iterative scheme following the spirit of the method of successive approximations, and it is robust to undesired collisions observed in the initial guesses. We carefully analyze and address several numerical challenges introduced by the discontinuities. The algorithm is tested on disc collision problems whose optimal solutions exhibit one or multiple collisions. Linear convergence in terms of the iteration steps and asymptotic first-order accuracy in terms of time discretization are observed when the algorithm is implemented with the forward-Euler scheme. The numerical results demonstrate that the proposed algorithm has better accuracy and convergence than direct methods based on gradient descent. The algorithm is also simpler, more accurate, and more stable than a deep reinforcement learning method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.