“…Our computational approach was inspired by the neurocognitive models of Botvinick, (2007) and Kiebel et al (2008), in which higher stages of cortical processing learned or controlled temporal structure at longer timescales. More generally, multiscale machine-learning architectures have been proposed for reducing the complexity of the learning problem at each scale and for representing multi-scale environments (Chung et al, 2016;Jaderberg et al, 2019;Mozer, 1992;Mujika et al, 2017;Quax et al, 2019;Schmidhuber, 1992). In neuroscience, multiple timescale representations have been proposed for learning value functions (Sutton, 1995), tracking reward (Bernacchia et al, 2011), and perceiving and controlling action (Botvinick, 2007;Paine and Tani, 2005).…”