The atypical Markov decision processes (MDPs) are decision-making for maximizing the immediate returns in only one state transition. Many complex dynamic problems can be regarded as the atypical MDPs, e.g., football trajectory control, approximations of the compound Poincaré maps, and parameter identification. However, existing deep reinforcement learning (RL) algorithms are designed to maximize long-term returns, causing a waste of computing resources when applied in the atypical MDPs. These existing algorithms are also limited by the estimation error of the value function, leading to a poor policy. To solve such limitations, this paper proposes an immediate-return algorithm for the atypical MDPs with continuous action space by designing an unbiased and low variance target Q-value and a simplified network framework. Then, two examples of atypical MDPs considering the uncertainty are presented to illustrate the performance of the proposed algorithm, i.e., passing the football to a moving player and chipping the football over the human wall. Compared with the existing deep RL algorithms, such as deep deterministic policy gradient and proximal policy optimization, the proposed algorithm shows significant advantages in learning efficiency, the effective rate of control, and computing resource usage.
Uncertainty and unknown nonlinearity are often inevitable in the suspension systems, which were often solved using fuzzy logic system (FLS) or neural networks (NNs). However, these methods are restricted by the structural complexity of the controller and the huge computing cost. Meanwhile, the estimation error of such approximators is affected by adopted adaptive laws and learning gains. Thus, in view of the above problem, this paper proposes the approximation-free control based on the bioinspired reference model for a class of uncertain suspension systems with unknown nonlinearity. The proposed method integrates the superior vibration suppression of the bioinspired reference model and the structural advantage of the prescribed performance function (PPF) in approximation-free control. Then, the vibration suppression performance is improved, the calculation burden is relieved, and the transient performance is improved, which is analyzed theoretically in this paper. Finally, the simulation results validate the approach, and the comparisons show the advantages of the proposed control method in terms of good vibration suppression, fast convergence, and less calculation burden.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.