5 Co-corresponding authors As humans and animals experience the world, they learn to associate states and actions with the expected values of the reward that is likely to follow [1][2][3] . Neural correlates of expected value are found in many brain regions, including the orbitofrontal cortex (OFC) [4][5][6][7][8][9] . While OFC value representations have been identified across many tasks and species [10][11][12][13][14][15] , their computational role remains controversial [16][17][18] . One influential hypothesis holds that they drive value-based choosing: The OFC represents the expected values of available options, and choices are made by comparing these values to one another 4,7,9 . A contrasting hypothesis holds that they drive learning: The OFC represents the expected values of immediately impending outcomes, which are compared to rewards actually received, so as to learn and adapt expectations to match the world 5,6,19,20 . In common laboratory tasks the items to be decided between are also the items to be learned about, making the two hypothesized roles difficult to distinguish. Here, we use a recently-developed multi-step task for rats 21 that separates choosing from learning. In a first step, rats choose one of two ports ("choice ports") whose expected values are computed using planning, and are not learned. In the second step, rats are led to one of two other ports ("outcome ports") which are not chosen between, but whose expected values are learned based on reward history. We found relatively weak OFC encoding of choice port values, needed for choosing but not learning, but far stronger encoding of outcome port values, needed for learning but not choosing. Moreover, temporally-specific silencing of OFC during outcome port entry was sufficient to disrupt behavior, and the nature of this disruption was consistent with impairment of a value learning process, but was not consistent with impairment of a choice process. We therefore suggest that value representations in the OFC directly drive learning, but do not directly drive choice.We trained rats on a two-step decision task, adapted from the human literature 22 , in which a choice made by the subject in a first step is probabilistically, not deterministically, linked to an outcome that occurs in a second step ( Fig, 1a ). In each trial of our rat version of this task 21 , the rat first initiated the trial by poking its nose into a neutral center port, and then made a decision between one of two choice ports ( Fig. 1a i,ii ). One choice caused a left outcome port to become available with high probability ("common" transition), and a right outcome port to become available with low probability ("uncommon" transition), while the opposite choice reversed these probabilities ( Fig. 1a iii) . Following the initial choice, an auditory tone informed the rat which of 1 . CC-BY-ND 4.0 International license It is made available under a (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.The cop...