To synthesize fixed-final-time control-constrained optimal controllers for discrete-time nonlinear control-affine systems, a single neural network (NN)-based controller called the Finite-horizon Single Network Adaptive Critic is developed in this paper. Inputs to the NN are the current system states and the time-to-go, and the network outputs are the costates that are used to compute optimal feedback control. Control constraints are handled through a nonquadratic cost function. Convergence proofs of: 1) the reinforcement learning-based training method to the optimal solution; 2) the training error; and 3) the network weights are provided. The resulting controller is shown to solve the associated time-varying Hamilton-Jacobi-Bellman equation and provide the fixed-final-time optimal solution. Performance of the new synthesis technique is demonstrated through different examples including an attitude control problem wherein a rigid spacecraft performs a finite-time attitude maneuver subject to control bounds. The new formulation has great potential for implementation since it consists of only one NN with single set of weights and it provides comprehensive feedback solutions online, though it is trained offline.
Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.