Machine and reinforcement learning (RL) are being applied to plan and control the behavior of autonomous systems interacting with the physical world -examples include self-driving vehicles, distributed sensor networks, and agile robots. However, if machine learning is to be applied in these new settings, the resulting algorithms must come with the reliability, robustness, and safety guarantees that are hallmarks of the control theory literature, as failures could be catastrophic. Thus, as RL algorithms are increasingly and more aggressively deployed in safety critical settings, it is imperative that control theorists be part of the conversation. The goal of this tutorial paper is to provide a jumping off point for control theorists wishing to work on RL related problems by covering recent advances in bridging learning and control theory, and by placing these results within the appropriate historical context of the system identification and adaptive control literatures.• Section II: provides an extensive literature review of work spanning classical and modern results in system identification, adaptive control, and RL.• Section III: introduces the fundamental problem and performance metrics considered in RL, and relates them to examples familiar to the controls community.• Section IV: provides a survey of contemporary results for problems with finite state and action spaces. • Section V: shows how system estimates and error bounds can be incorporated into model-based self-tuning regulators with finite-time performance guarantees.• Section VI: presents guarantees for model-free methods, and shows that a complexity gap exists between model-based and model-free methods.
II. LITERATURE REVIEWThe results we present in this paper draw heavily from three broad areas of control and learning theory: system identification, adaptive control, and approximate dynamic programming (ADP) or, as it has come to be known, reinforcement learning. Each of these areas has a long and rich history and a general literature review is outside the scope of this tutorial. Below we will instead emphasize pointers to good textbooks and survey papers, before giving a more careful account of recent work.1) System Identification: The estimation of system behavior from input/output experiments has a well-developed theory dating back to the 1960s, particularly in the case of linear-time-invariant systems. Standard reference texts on the topic include [6], [8], [9], [10]. The success of discrete time series analysis by Box and Jenkins [11] provided an early impetus for the extension of these methods to the controlled system setting. Important connections to information theory were established by Akaike [12]. The rise of robust control in the 1980s further inspired system identification procedures, wherein model errors were optimized under the assumption of adversarial noise processes [13]. Another important step was the development of subspace methods [14], which became a powerful tool for identification of multi-input multi-output systems.2) Adaptive...