“…Planning algorithms in RL. The problem of planning in POMDPs (namely, finding the optimal policy when the model is known) has been extensively studied, with various proposed heuristics (e.g., [Mon82, CLZ97, HF00a, Hau00, RG02, TK03, PB04, SV05, PGT06, RPPCD08, SV10, SS12, SYHL13, GHL19, Han98, MKKC99, KMN99, LYX11, AYA18]), and a few provably efficient algorithms [BDRS96,KECM21a,GMR22]. Most closely related to our work is [GMR22], which shows a quasipolynomial-time planning algorithm for observable POMDPs.…”