While positive reward prediction errors (RPEs) and negative RPEs have equal impacts in the standard reinforcement learning, the brain appears to have distinct neural pathways for learning mainly from either positive or negative feedbacks, such as the direct and indirect pathways of the basal ganglia (BG). Given that distinct pathways may unevenly receive inputs from different neural populations and/or regions, how states or actions are represented can differ between the pathways. We explored whether combined use of different representations, coupled with different learning rates from positive and negative RPEs, has computational benefits. We considered an agent equipped with two learning systems, each of which adopted individual representation (IR) or successor representation (SR) of states. With varying the combination of IR or SR and also the learning rates from positive and negative RPEs in each system, we examined how the agent performed in a certain dynamic reward environment. We found that combination of SR-based system learning mainly from positive RPEs and IR-based system learning mainly from negative RPEs outperformed the other combinations, including IR-only or SR-only cases and the cases where the two systems had the same ratios of positive- and negative-RPE-based learning rates. In the best combination case, both systems show activities of comparable magnitudes with opposite signs, consistent with suggested profiles of BG pathways. These results suggest that particularly combining different representations with appetitive and aversive learning could be an effective learning strategy in a certain dynamic reward environment, and it might actually be implemented in the cortico-BG circuits.
Difficulty in cessation of drinking, smoking, or gambling has been widely recognized. Conventional theories proposed relative dominance of habitual over goal‐directed control, but human studies have not convincingly supported them. Referring to the recently suggested “successor representation (SR)” of states that enables partially goal‐directed control, we propose a dopamine‐related mechanism that makes resistance to habitual reward‐obtaining particularly difficult. We considered that long‐standing behavior towards a certain reward without resisting temptation can (but not always) lead to a formation of rigid dimension‐reduced SR based on the goal state, which cannot be updated. Then, in our model assuming such rigid reduced SR, whereas no reward prediction error (RPE) is generated at the goal while no resistance is made, a sustained large positive RPE is generated upon goal reaching once the person starts resisting temptation. Such sustained RPE is somewhat similar to the hypothesized sustained fictitious RPE caused by drug‐induced dopamine. In contrast, if rigid reduced SR is not formed and states are represented individually as in simple reinforcement learning models, no sustained RPE is generated at the goal. Formation of rigid reduced SR also attenuates the resistance‐dependent decrease in the value of the cue for behavior, makes subsequent introduction of punishment after the goal ineffective, and potentially enhances the propensity of nonresistance through the influence of RPEs via the spiral striatum‐midbrain circuit. These results suggest that formation of rigid reduced SR makes cessation of habitual reward‐obtaining particularly difficult and can thus be a mechanism for addiction, common to substance and nonsubstance reward.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.