Animals and humans replay neural patterns encoding trajectories through their environment, both whilst they solve decision-making tasks and during rest. Both on-task and off-task replay are believed to contribute to flexible decision making, though how their relative contributions differ remains unclear. We investigated this question by using magnetoencephalography to study human subjects while they performed a decision-making task that was designed to reveal the decision algorithms employed. We characterized subjects in terms of how flexibly each adjusted their choices to changes in temporal, spatial and reward structure. The more flexible a subject, the more they replayed trajectories during task performance, and this replay was coupled with re-planning of the encoded trajectories. The less flexible a subject, the more they replayed previously and subsequently preferred trajectories during rest periods between task epochs. The data suggest that online and offline replay both participate in planning but support distinct decision strategies.
Introduction 1Online and offline replay are both suggested to contribute to decision making 1-15 , but their 2 precise contributions remain unclear. Replay of experienced and expected state transitions 3 during a task, either immediately before choice or following outcome feedback, is 4 particularly well suited to mediate on-the-fly planning, where choices are evaluated based on 5 the states to which they lead (this is known as model-based planning). Off-task replay might 6 serve a complementary role of consolidating a model of a state space, specifying how each 7 state can be reached from other states and the values of those states. According to this 8 perspective, both types of replay help subjects make choices that are flexibly adapted to 9 current circumstances.
10However, a different possibility is that off-task replay also directly participates in planning, 11 by calculating and storing a (so-called model-free) decision policy that specifies in advance 12 what to do in each state [16][17][18][19] . Such a pre-formulated policy is inherently less flexible than a 13 policy that is constructed on the fly, but at the same time it decreases a need for subsequent 14 online planning when time itself might be limited. Thus, rather than online and offline replay 15 both supporting the same form of planning, this latter perspective suggests a trade-off 16 between them. In other words online replay promotes an on-the-fly model-based flexibility, 17 whereas offline replay establishes a stable model-free policy.
18Despite the wide-ranging behavioural implications of the distinction between model-based 19 and model-free planning 20-23 , and much theorising on the role of replay in one or the other 20 form of planning, to date there is little data to suggest whether online and offline replay have 21 complementary or contrasting impacts in this regard. Therefore, we tested the relationship 22 between both online and offline replay and key aspects of decision flexibility that dissociate...