“…Six of these are Bayesian models that vary along three dimensions: first, whether or not they can generalize learned values from one option to the other, second, whether or not they use uncertainty-guided exploration strategy, and third, whether or not they can compose the learned values from the first two sub-task to reason on the final sub-task. The Bayesian models include a Bayesian mean-tracker (BMT) 36 which is a model that does not learn about the underlying functional structure but instead updates its beliefs about rewards for each option independently, as well as a model that learns functions by generalizing across options within a sub-task based on the idea of Gaussian Process regression (GPR) 18,37,38 . For each of these two models, we considered one variant that cannot compose and instead learns separate reward functions for each sub-task, another that does not perform uncertainty-guided exploration, and lastly, one that initializes its predictions in the final sub-task to the composition of the learned means from the first two sub-tasks.…”