In many choice scenarios, including prey, employment, and mate search, options are not encountered simultaneously and so cannot be directly compared. Deciding which ones optimally to engage, and which to forego, requires developing accurate beliefs about the overall distribution of prospects. However, the role of learning in this process -and how biases due to learning may affect choice -are poorly understood. In three experiments, we adapted a classic prey selection task from foraging theory to examine how individuals kept track of an environment's reward rate and adjusted their choices in response to its fluctuations. In accord with qualitative predictions from optimal foraging models, participants adjusted their selectivity to the richness of the environment: becoming less selective in poorer environments and increasing acceptance of less profitable options. These preference shifts were observed not just in response to global (between block) manipulations of the offer distributions, but also to local, trial-by-trial offer variation within a block, suggesting an incremental learning rule. Further offering evidence into the learning process, these preference changes were more pronounced when the environment improved compared to when it deteriorated. All these observations were best explained by a trial-by-trial learning model in which participants estimate the overall reward rate, but with upward vs. downward changes controlled by separate learning rates. A failure to adjust expectations sufficiently when an environment becomes worse leads to suboptimal choices: options that are valuable given the environmental conditions are rejected in the false expectation that better options will materialize. These findings offer a previously unappreciated parallel in the serial choice setting of observations of asymmetric updating and resulting biased (often overoptimistic) estimates in other domains.Previously, we have shown (Constantino and Daw, 2015) that when individuals undertake a different type of foraging problem ("patch leaving"), their choices are well explained by an errordriven incremental learning rule for estimating the environment's reward rate (Schultz et al., 1997;Sutton and Barto, 1998), which then serves as a comparator for determining which prospects are acceptable. However, systematic deviations from optimal thresholds were detected, which are even more pronounced in participants under stress (Lenow et al., 2017) or with depleted levels of dopamine . This suggests that there are circumstances under which individuals may misestimate the environment's rate of return, and in doing so make choices out of step with what the MVT would predict. However, in the line of studies on patch foraging, these biases have simply been treated as fixed choice tendencies, and in particular have not been shown to arise from the underlying learning process.