“…Most of the existing work focus on the tabular setting; see e.g., Strehl et al (2006); Jaksch et al (2010); Osband et al (2014); Osband and Van Roy (2016); Azar et al (2017); Dann et al (2017); Agrawal and Jia (2017); Jin et al (2018); Russo (2019); Rosenberg and Mansour (2019a,b); Jin and Luo (2019); Zanette and Brunskill (2019); Simchowitz and Jamieson (2019); Dong et al (2019b) and the references therein. Under the function approximation setting, sample-efficient algorithms have been proposed using linear function approximators (Abbasi-Yadkori et al, 2019a,b;Yang and Wang, 2019a;Du et al, 2019b;Cai et al, 2019;Wang et al, 2019), as well as nonlinear ones (Wen and Van Roy, 2017;Jiang et al, 2017;Dann et al, 2018;Du et al, 2019b;Dong et al, 2019a;Du et al, 2019a). Among these results, our work is most related to ; ; Cai et al (2019), which consider linear MDP models and propose optimistic and randomized variants of least-squares value iteration (LSVI) (Bradtke and Barto, 1996;Osband et al, 2014) as well as optimistic variants of proximal policy optimization (Schulman et al, 2017).…”