Decision-making in a complex world, characterized both by predictable regularities and by frequent departures from the norm, requires dynamic switching between rapid habit-like, automatic processes and slower, more flexible evaluative processes. These strategies, formalized as ‘model-free’ and ‘model-based’ reinforcement learning algorithms, respectively, can lead to divergent behavioral outcomes, requiring a mechanism to arbitrate between them in a context-appropriate manner. Recent data suggest that individuals with obsessive-compulsive disorder (OCD) rely excessively on inflexible habit-like decision-making during reward-driven learning. We propose that inflexible reliance on habit in OCD may reflect a functional weakness in the mechanism for context-appropriate dynamic arbitration between model-free and model-based decision-making. Support for this hypothesis derives from emerging functional imaging findings. A deficit in arbitration in OCD may help to reconcile evidence for excessive reliance on habit in rewarded learning tasks with an older literature suggesting inappropriate recruitment of circuitry associated with model-based decision-making in unreinforced procedural learning. The hypothesized deficit and corresponding circuitry may be a particularly fruitful target for interventions, including cognitive remediation.