Signal temporal logic (STL) provides a userfriendly interface for defining complex tasks for robotic systems. Recent efforts aim at designing control laws or using reinforcement learning methods to find policies which guarantee satisfaction of these tasks. While the former suffer from the trade-off between task specification and computational complexity, the latter encounter difficulties in exploration as the tasks become more complex and challenging to satisfy. This paper proposes to combine the benefits of the two approaches and use an efficient prescribed performance control (PPC) base law to guide exploration within the reinforcement learning algorithm. The potential of the method is demonstrated in a simulated environment through two sample navigational tasks.
Path integral policy improvement (PI 2 ) is a datadriven method for solving stochastic optimal control problems. Both feedforward and feedback controls are calculated based on a sample of noisy open-loop trajectories of the system and their costs, which can be obtained in a highly parallelizable manner. The control strategy offers theoretical performance guarantees related to the expected cost achieved by the resulting closed-loop system. This paper extends the single-agent case to a multi-agent setting, where such theoretical guarantees have not been attained previously. We provide both a decentralized and a leader-follower scheme for distributing the feedback calculations under different communication constraints. The theoretical results are verified numerically through simulations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.