Extinction learning, the process of ceasing an acquired behavior in response to altered reinforcement contingencies, is essential for survival in a changing environment. So far, research has mostly neglected the learning dynamics and variability of behavior during extinction learning and instead focused on a few response types that were studied by population averages. Here, we take a different approach by analyzing the trial-by-trial dynamics of operant extinction learning in both pigeons and a computational model. The task involved discriminant operant conditioning in context A, extinction in context B, and a return to context A to test the context-dependent return of the conditioned response (ABA renewal). By studying single learning curves across animals under repeated sessions of this paradigm, we uncovered a rich variability of behavior during extinction learning: (1) Pigeons prefer the unrewarded alternative choice in one-third of the sessions, predominantly during the very first extinction session an animal encountered. (2) In later sessions, abrupt transitions of behavior at the onset of context B emerge, and (3) the renewal effect decays as sessions progress. While these results could be interpreted in terms of rule learning mechanisms, we show that they can be parsimoniously accounted for by a computational model based only on associative learning between stimuli and actions. Our work thus demonstrates the critical importance of studying the trial-by-trial dynamics of learning in individual sessions, and the unexpected power of "simple" associative learning processes.
Significance StatementOperant conditioning is essential for the discovery of purposeful actions, but once a stimulus-response association is acquired, the ability to extinguish it in response to altered reward contingencies is equally important. These processes also play a fundamental role in the development and treatment of pathological behaviors such as drug addiction, overeating and gambling. Here we show that extinction learning is not limited to the cessation of a previously reinforced response, but also drives the emergence of complex and variable choices that change from learning session to learning session. At first sight, these behavioral changes appear to reflect abstract rule learning, but we show in a computational model that they can emerge from "simple" associative learning.