“…With our natural definition of regret, their algorithm suffers linear regret. Other learning algorithms for POMDPs either consider linear dynamics (Lale et al, 2020b;Tsiamis & Pappas, 2020) or do not consider regret (Shani et al, 2005;Ross et al, 2007;Poupart & Vlassis, 2008;Cai et al, 2009;Liu et al, 2011;Doshi-Velez et al, 2013;Katt et al, 2018;Azizzade-nesheli et al, 2018) and are not directly comparable to our setting.…”