Reinforcement learning theories postulate that actions are chosen to maximize a long-term sum of positive outcomes based on value functions, which are subjective estimates of future rewards. In simple reinforcement learning algorithms, value functions are updated only by trial-and-error, whereas they are updated according to the decision-maker's knowledge or model of the environment in model-based reinforcement learning algorithms. To investigate how animals update value functions, we trained rats under two different free-choice tasks. The reward probability of the unchosen target remained unchanged in one task, whereas it increased over time since the target was last chosen in the other task. The results show that goal choice probability increased as a function of the number of consecutive alternative choices in the latter, but not the former task, indicating that the animals were aware of time-dependent increases in arming probability and used this information in choosing goals. In addition, the choice behavior in the latter task was better accounted for by a model-based reinforcement learning algorithm. Our results show that rats adopt a decision-making process that cannot be accounted for by simple reinforcement learning models even in a relatively simple binary choice task, suggesting that rats can readily improve their decision-making strategy through the knowledge of their environments.Animals must continually update their behavioral strategies according to changes in an environment in order to optimize their choices. Reinforcement learning (RL) models (Sutton and Barto 1998) provide a powerful theoretical framework for understanding choice behavior in humans and animals in a dynamic environment. In theories of RL, future actions are chosen so as to maximize a long-term sum of positive outcomes, and this can be accomplished by a set of value functions that represent the amount of expected reward that is associated with particular states or actions. The value functions are continually updated based on the reward prediction error, which is the difference between the expected and actual rewards. This way, even without prior knowledge about an uncertain and dynamically changing environment, an animal can discover the structure of the environment that can be exploited for optimal choice by trial-and-error. Not surprisingly, human and monkey choice behaviors in various tasks are well described by reinforcement learning algorithms (e.g., O'Doherty et al. 2003;Barraclough et al. 2004;Lee et al. 2004;Samejima et al. 2005;Daw et al. 2006;Pessiglione et al. 2006).The updating of value functions can be achieved in two fundamentally different ways. In simple or direct RL algorithms, value functions are updated only by trial-and-error. In other words, only the value function that is associated with the chosen action is updated, and those that are associated with uncommitted actions remain unchanged. On the other hand, in indirect or model-based RL algorithms, the value functions might also change according to the decis...