“…On the other hand, due to recent successes of reinforcement learning (RL) in the control of physical systems (Yang et al, 2019;OpenAI et al, 2019;Hwangbo et al, 2019;Williams et al, 2017;Levine et al, 2016), there has been a flurry of research in online RL algorithms for continuous control. In contrast to the classical setting of adaptive nonlinear control, online RL algorithms operate in discrete-time, and often come with finite-time regret bounds (Wang et al, 2019;Cao and Krishnamurthy, 2020;Cai et al, 2020;Agarwal et al, 2020). These bounds provide a quantitative rate at which the control performance of the online algorithm approaches the performance of an oracle equipped with hindsight knowledge of the uncertainty.…”