2022
DOI: 10.1162/jocn_a_01777
|View full text |Cite
|
Sign up to set email alerts
|

Task Learnability Modulates Surprise but Not Valence Processing for Reinforcement Learning in Probabilistic Choice Tasks

Abstract: The goal of temporal difference (TD) reinforcement learning is to maximize outcomes and improve future decision-making. It does so by utilizing a prediction error (PE), which quantifies the difference between the expected and the obtained outcome. In gambling tasks, however, decision-making cannot be improved because of the lack of learnability. On the basis of the idea that TD utilizes two independent bits of information from the PE (valence and surprise), we asked which of these aspects is affected when a ta… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 90 publications
0
4
0
Order By: Relevance
“…In a more exploratory analysis, we investigated if the neural activity also reflects unsigned prediction errors. In line with a previous publication [ 32 ], the prediction error was split into is constituent parts. Valence refers to the sign of the prediction error (better or worse than expected), while surprise refers to the magnitude of the deviation between expectation and observations.…”
Section: Resultsmentioning
confidence: 99%
“…In a more exploratory analysis, we investigated if the neural activity also reflects unsigned prediction errors. In line with a previous publication [ 32 ], the prediction error was split into is constituent parts. Valence refers to the sign of the prediction error (better or worse than expected), while surprise refers to the magnitude of the deviation between expectation and observations.…”
Section: Resultsmentioning
confidence: 99%
“…Additionally, differences in learning paradigms, such as different learning and feedback stimuli, task difficulty or rule changes, might produce a different set of overlapping components and potentially latency shifts. Such differences might explain why some studies find PE effects rather at parietal sites in a time window associated with the P3 51 , while others find PE modulations only in the FRN time window 24 . To date, it is unclear whether this signature of the PE in the P300, especially at frontal electrodes, constitutes a separate or a sustained process from the FRN/RewP time range 43 .…”
Section: Discussionmentioning
confidence: 98%
“…Lastly, there is also heterogeneity in previous studies regarding how the PE is computed. While a growing body of research uses computational modelling to derive PEs, that considers individual learning processes to infer latent reward expectations of participants in a trial-by-trial fashion 51 , other studies rely solely on statistical reward probabilities of stimuli inherent in the experimental design, serving as a proxy for expectedness/PEs. As we demonstrate with an additional exploratory analysis (Supplementary Material S2 ) in which we replaced our model-derived PEs with the fixed reward probabilities, the two operationalizations map different processes.…”
Section: Discussionmentioning
confidence: 99%
“…To investigate this hypothesis, we utilized a model-based analysis of brain activity, i.e., we estimated single-trial values of prediction errors for the correct and incorrect policy using our model, and regressed these values on single-trial EEG data elicited for each trial and locked to the outcome. In line with a previous publication (Wurm et al, 2021), the prediction error was split into its constituting parts. Valence refers to the sign of the prediction error (better or worse than expected), while surprise refers to the magnitude of the deviation between expectation and observation.…”
Section: Evidence For Neural Implementation Of Multiple Concurrent Fi...mentioning
confidence: 99%