“…Finally, the environment changes its state based on the agent's action, and the cycle begins anew. Much effort had been devoted to study specific instances of this abstract model, e.g., multi-armed bandits (MABs) (Auer et al, 2002;Garivier & Cappé, 2011;Kaufmann et al, 2012;Agrawal & Goyal, 2012), linear bandits (Dani et al, 2008;Abbasi-Yadkori et al, 2011;Agrawal & Goyal, 2013;Abeille et al, 2017) and reinforcement learning (RL) settings (Azar et al, 2017;Jin et al, 2018;Dann et al, 2019;Zanette & Brunskill, 2019;Efroni et al, 2019;Simchowitz & Jamieson, 2019;Tarbouriech et al, 2020;Cohen et al, 2020;Zhang et al, 2020). However, there are (still) several gaps between theory and practice that hinder the application of these models in real-world problems.…”