In an ever-changing environment, animals must learn to flexibly select actions based on sensory input and the anticipated positive and negative consequences. This type of adaptive behavior is studied using perceptual decision-making (PDM) tasks that feature block-wise changes in stimulus-response-outcome contingencies. Despite extensive research on PDM, there exists no widely accepted mechanistic model of the decision process that captures trial-by-trial adaptation to contingency changes. To address this gap, we first specified three signal detection theory-based models of adaptive PDM. Next, we identified several scenarios in which these models make diverging predictions. For experimental testing, we subjected rats and pigeons to a two-choice auditory discrimination task comprising a sequence of experimental conditions that differed in their stimulus-response-outcome contingencies. The contingency manipulations were implemented through the concomitant manipulation of reward probabilities, stimulus presentation probabilities and stimulus discriminability across two stimulus-response categories. We find that both rats and pigeons exhibit condition-specific response biases that increase total reward across the entire range of experimental conditions. However, none of the models were able to fit the choice data across all experimental conditions. Through detailed behavior analysis, we demonstrate that learning is driven by the integration of rewards, but not reward omissions. Moreover, model-based analyses reveal that reward integration is influenced by two additional factors, namely perceptual uncertainty and the alignment of steady-state response ratios to relative (rather than absolute) reward differences between the two choices. A model incorporating these factors accounts well for behavioral data across experimental conditions for both species and connects the arguably most influential framework of perception, signal detection theory, with a learning mechanism operating at the level of single trials which, in the steady state, produces behavior consistent with the generalized matching law from animal learning theory.