“…Exploration has been widely studied in the tabular setting (Azar et al, 2017;Zanette and Brunskill, 2019;Efroni et al, 2019;Jin et al, 2018;Dann et al, 2019;Zhang et al, 2020;Russo, 2019), but obtaining formal guarantees for exploration with function approximation is a challenge even in the linear case due to recent lower bounds (Du et al, 2019;Weisz et al, 2020;Zanette, 2020;Wang et al, 2020a). When the action-value function is only approximately linear, several ideas from tabular exploration and linear bandits (Lattimore and Szepesvári, 2020) have been combined to obtain provably efficient algorithms in low-rank MDPs (Yang and Wang, 2020;Zanette et al, 2020a;Jin et al, 2020) and their extensions (Wang et al, 2019(Wang et al, , 2020b.…”