1Distinct model-free and model-based learning processes are thought to drive both typical and 2 dysfunctional behaviors. Data from two-stage decision tasks have seemingly shown that human 3 behavior is driven by both processes operating in parallel. In this study, however, we show that 4 more detailed task instructions lead participants to make primarily model-based choices that show 5 little, if any, model-free influence. We also demonstrate that the standard methods of analyzing 6 the two-stage task may falsely classify purely model-based agents that misunderstood the task as 7 hybrid model-based/model-free actors. Furthermore, we found evidence that many participants 8 do misunderstand the task in important ways. Overall, we argue that humans formulate a wide 9 variety of learning models. Consequently, the simple dichotomy of model-free versus model-based 10 learning is inadequate to explain behavior in the two-stage task and connections between reward 11 learning, habit formation, and compulsivity. 12 Introduction 13Once upon a time, we set out to investigate how stimulus presentation influences habitual learning in 14 humans [1]. Habits are thought to be learned via model-free learning [2], a strategy that operates by 15 strengthening or weakening associations between stimuli and actions, depending on whether the action 16 is followed by a reward or not [3]. Conversely, another strategy known as model-based learning generates 17 goal-directed behavior [2], and may even protect against habit formation [4]. Model-based learning 18 selects actions by computing action values at decision time based on a model of the environment. 19 Two-stage learning tasks ( Figure 1A) have frequently been used to dissociate model-free and model-20 based influences on choice behavior Therefore, to begin to address our questions about habits, we designed an experiment employing a 22 two-stage learning task. However, when we examined our results in detail, some of the findings did not 23 make sense at all [1]. For example, we observed negative effects of reward that could not be explained 24 by model-free or model-based learning processes. 25 After a series of analyses on choice data from human participants and simulated agents, we came to 26 the conclusion that the human participants in our sample must have been confused about how the task 27 worked. Inspired by a version of the task adapted for children [18], we modified the task instructions 28 to tell participants a story that included causes and effects within a physical system, rather than 29 give them a simple set of abstract symbols and numerical probabilities ( Figure 1B-D). Such story-like 30 instructions seemed to work well in previous studies e.g. [18, 19], and therefore, we predicted that 31 our improved instructions would alleviate participants' confusion. What we did not predict was that 32 the new instructions would eliminate nearly all evidence of model-free learning. These results left us 33 even more confused than before. 34 Our results puzzled us bec...