“…To achieve this goal, function approximation, which uses a class of predefined functions to approximate either the value function or transition dynamic, has been widely studied in recent years. Specifically, a series of recent works (Jiang et al, 2017;Jin et al, 2019;Modi et al, 2020;Zanette et al, 2020;Ayoub et al, 2020;Zhou et al, 2020) have studied RL with linear function approximation with provable guarantees. They show that with linear function approximation, one can either obtain a sublinear regret bound against the optimal value function (Jin et al, 2019;Zanette et al, 2020;Ayoub et al, 2020;Zhou et al, 2020) or a polynomial sample complexity bound (Kakade et al, 2003) (Probably Approximately Correct (PAC) bound for short) in finding a near-optimal policy (Jiang et al, 2017;Modi et al, 2020).…”