Humans choose actions based on both habit and planning. Habitual control is computationally frugal but adapts slowly to novel circumstances, whereas planning is computationally expensive but can adapt swiftly. Current research emphasizes the competition between habits and plans for behavioral control, yet many complex tasks instead favor their integration. We consider a hierarchical architecture that exploits the computational efficiency of habitual control to select goals while preserving the flexibility of planning to achieve those goals. We formalize this mechanism in a reinforcement learning setting, illustrate its costs and benefits, and experimentally demonstrate its spontaneous application in a sequential decision-making task.T he distinction between habitual and planned action is fundamental to behavioral research (1-4). Habits enable computationally efficient decision making, but at the cost of behavioral flexibility. They form as stimulus-response pairings are "stamped in" following reward, as in Thorndike's law of effect (3). Planning, in contrast, enables more flexible and productive decision making. It is accomplished by first searching over a causal model linking candidate actions to their expected outcomes and then selecting actions based on their anticipated rewards. Planning imposes a severe computational cost, however, as the size and complexity of a model grows.Past research emphasizes the competition between habitual and planned control of behavior (5, 6). Habitual control is favored when an individual has extensive experience with a task and when the optimal behavior policy is relatively consistent across time; meanwhile, planning is favored for novel tasks and when the optimal policy is variable, provided that an agent represents an adequate model of their task (7).Methods of integrating habitual and planned control have received less attention (8-10), yet real-world tasks often favor elements of each. Consider, for instance, a seasoned journalist who reports on new events each day. At a high level of abstraction, her reporting is structured around a repetitive series of goal-directed actions: follow leads, interview sources, evade meddling editors, etc. Because these actions are reliably valuable for any news event, their selection is an excellent candidate for habitual control. The concrete steps necessary to carry out any individual action will be highly variable, however-optimal behavior when interviewing a pop star may be suboptimal when interviewing the Pope. Thus, the implementation of the abstract actions is an excellent candidate for planning. This example illustrates the utility of nesting elements of both habits and plans in a hierarchy of behavioral control (11-13).Indeed, it is widely recognized that humans mentally organize their behavior around hierarchically organized goals and subgoals (3,14,15). In principle, hierarchical organization can be implemented exclusively by habitual control (16), or exclusively by planning (13, 17). However, these homogenous mechanisms foreclose...