“…There are many differences between our study and previous planning studies in humans. Most of these studies relied on tasks without uncertainty (Classical planning) or tasks in which the uncertainty is limited to stochastic transitions between states (Markov Decision Processes, MDPs), and have focused on how people cope with the combinatorial explosion that occurs as the planning horizon increases (Keramati et al, 2016;Huys et al, 2012;Callaway et al, 2021), the depth with which people plan (Snider et al, 2015;van Opheusden et al, 2021), or the extent to which people use model-based or model-free strategies when learning from reinforcement (Daw et al, 2011(Daw et al, , 2005Keramati et al, 2016). The present study is different because we focus on how people disambiguate a single hidden state from a sequence of information-seeking and reward-seeking actions.…”