“…In the preceding deterministic MDP formulation, we aim at solving a goal-reaching RL problem (Kaelbling, 1993b;Sutton et al, 2011;Andrychowicz et al, 2017;Andreas et al, 2017;Pong et al, 2018;Ghosh et al, 2019;Eysenbach et al, 2020aEysenbach et al, , 2020bKadian et al, 2020;Fujita et al, 2020;Chebotar et al, 2021;Khazatsky et al, 2021) or a planning problem (Bertsekas & Tsitsiklis, 1996;Boutilier et al, 1999;Sutton et al, 1999;Boutilier et al, 2000;Rintanen & Hoffmann, 2001;LaValle, 2006;Russell & Norvig, 2009;Nasiriany et al, 2019). We say a Q-function is successful if its associated greedy policy (Sutton & Barto, 2018)…”