Cognitive flexibility refers to the ability to adjust to changes in the environment and is essential for adaptive behavior. It can be investigated using laboratory tests such as probabilistic reversal learning (PRL). In individuals with both Cocaine Use Disorder (CUD) and Gambling Disorder (GD), overall impairments in PRL flexibility are observed. However, it is poorly understood whether this impairment depends on the same brain mechanisms in cocaine and gambling addictions. Reinforcement learning (RL) is the process by which rewarding or punishing feedback from the environment is used to adjust behavior, to maximise reward and minimise punishment. Using RL models, a deeper mechanistic explanation of the latent processes underlying cognitive flexibility can be gained. Here, we report results from a re-analysis of PRL data from control participants (n=18) and individuals with either GD (n=18) or CUD (n=20) using a hierarchical Bayesian RL approach. We observed significantly reduced stimulus stickiness (i.e., stimulus-bound perseveration) in GD, which may reflect increased exploratory behavior that is insensitive to outcomes. RL parameters were unaffected in CUD. We relate the behavioral findings to their underlying neural substrates through an analysis of task-based fMRI data. We report differences in tracking reward and punishment expected values (EV) in individuals with GD compared to controls, with greater activity during reward EV tracking in the cingulate gyrus and amygdala. In CUD, we observed reduced responses to positive punishment prediction errors (PPE) and increased activity following negative PPEs in the superior frontal gyrus compared to controls. Thus, an RL framework serves to differentiate behavior in a probabilistic learning paradigm in two compulsive disorders, GD and CUD.