“…Indeed, algorithms like MuZero and its predecessors [Silver et al, 2017, Oh et al, 2017, Schrittwieser et al, 2020 never approximate reward functions and transition models with respect to the raw image observations generated by the environment, but instead incrementally learn some latent representation of state upon which a corresponding model is approximated for planning. This philosophy is born out of several years of work that elucidate the important of state abstraction as a key tool for avoiding the irrelevant information encoded in environment states and addressing the challenge of generalization for sample-efficient reinforcement learning large-scale environments [Whitt, 1978, Bertsekas and Castañon, 1989, Dean and Givan, 1997, Ferns et al, 2004, Jong and Stone, 2005, Li et al, 2006, Van Roy, 2006, Ferns et al, 2012, Jiang et al, 2015, Abel et al, 2016, 2018, Dong et al, 2019, Du et al, 2019, Arumugam and Van Roy, 2020, Misra et al, 2020, Agarwal et al, 2020, Abel et al, 2020, Abel, 2020, Dong et al, 2021. In this section, we briefly introduce a small extension of VSRL that builds on these insights to accommodate lossy MDP compressions defined on a simpler, abstract state space (also referred to as aleatoric or situational state by Lu et al [2021], Dong et al [2021]).…”