Behavioral and neural evidence reveal a prospective goal-directed decision process that relies on mental simulation of the environment, and a retrospective habitual process that caches returns previously garnered from available choices. Artificial systems combine the two by simulating the environment up to some depth and then exploiting habitual values as proxies for consequences that may arise in the further future. Using a three-step task, we provide evidence that human subjects use such a normative plan-until-habit strategy, implying a spectrum of approaches that interpolates between habitual and goal-directed responding. We found that increasing time pressure led to shallower goal-directed planning, suggesting that a speed-accuracy tradeoff controls the depth of planning with deeper search leading to more accurate evaluation, at the cost of slower decision-making. We conclude that subjects integrate habit-based cached values directly into goal-directed evaluations in a normative manner.planning | habit | reinforcement learning | speed/accuracy tradeoff | tree-based evaluation B ehavioral and neural evidence suggest that the brain uses distinct goal-directed and habitual systems for decisionmaking (1-5). A goal-directed system exploits an individual's model, i.e., their knowledge of environmental dynamics, to simulate the consequences that will likely follow a choice (6) (Fig. 1A). Such evaluations, which assess a decision-tree expanding into the future to estimate the total reward, adapt flexibly to changes in environmental dynamics or the values of outcomes. Evaluating deep trees, however, is computationally expensive (in terms of time, working memory, metabolic energy, etc.) and potentially error-prone. By contrast, the habitual system simply caches the rewards received on previous trials conditional on the choice (Fig. 1C) without a representational characterization of the environment (hence being called "model-free") (6, 7). This process hinders adaptation to changes in the environment, but has advantageous computational simplicity. Previous studies show distinct behavioral and neurobiological signatures of both systems (8-18). Furthermore, consistent with the theoretical strengths and weaknesses of each system (2, 19), different experimental conditions influence the relative contributions of the two systems in controlling behavior according to their respective competencies (20)(21)(22)(23).Here, we suggest that individuals, rather than simply showing greater reliance on the more competent system in each condition, combine the relative strengths of the two systems in a normative manner by integrating habit-based cached values directly into goaldirected evaluations. Specifically, we propose that given available resources (time, working memory, etc.), individuals decide the depth k up to which they can afford full forward simulations and use cached habitual values thereafter. That is, individuals compute the value of a choice by adding the first k rewards, predicted by the explicit simulation, to the value...