Pupil dilation and response slowing distinguish deliberate explorative choices in the probabilistic learning task

Kozunova, Galina L.; Sayfulina, Ksenia E.; Prokofyev, Andrey O.; Medvedev, Vladimir; Rytikova, A.; Stroganova, Tatiana A.; Владимирович, Чернышев Борис

doi:10.3758/s13415-022-00996-z

Cited by 10 publications

(32 citation statements)

References 76 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Secondly, in addition to the overall heightened dilation of pupils in PC phase, we found that only in this phase value-driven modulation of pupil size was significant, and this effect was predictive of the behavioral speed modulation. Modulation of pupil responses by reward value is 10.3389/fnhum.2022.1062168 in line with a number of previous findings Braver, 2013, 2014;Massar et al, 2016;Koelewijn et al, 2018;Pietrock et al, 2019;Walsh et al, 2019) and indicates that when the delivery of reward is contingent on task performance, higher reward incentives could efficiently mobilize the processing resources, and settle an efficient relationship between the speed and accuracy of choices, effects that are also reflected in the taskevoked pupil dilatation and have been reported across motor (Naber and Murphy, 2020), perceptual (Walsh et al, 2019), and cognitive (Kozunova et al, 2022) tasks. On the other hand, the lack of value-driven modulation of pupil responses for PR cues is in line with effects reported in previous studies, where reward-driven modulations of pupil size were only found during the learning of reward associations (Anderson and Yantis, 2012) but were absent during the test phase when rewardassociations were implicit (Hammerschmidt et al, 2018).…”

Section: Discussionsupporting

confidence: 88%

Value-driven modulation of visual perception by visual and auditory reward cues: The role of performance-contingent delivery of reward

Antono

Vakhrushev

Pooresmaeili

2022

Front. Hum. Neurosci.

View full text Add to dashboard Cite

Perception is modulated by reward value, an effect elicited not only by stimuli that are predictive of performance-contingent delivery of reward (PC) but also by stimuli that were previously rewarded (PR). PC and PR cues may engage different mechanisms relying on goal-driven versus stimulus-driven prioritization of high value stimuli, respectively. However, these two modes of reward modulation have not been systematically compared against each other. This study employed a behavioral paradigm where participants’ visual orientation discrimination was tested in the presence of task-irrelevant visual or auditory reward cues. In the first phase (PC), correct performance led to a high or low monetary reward dependent on the identity of visual or auditory cues. In the subsequent phase (PR), visual or auditory cues were not followed by reward delivery anymore. We hypothesized that PC cues have a stronger modulatory effect on visual discrimination and pupil responses compared to PR cues. We found an overall larger task-evoked pupil dilation in PC compared to PR phase. Whereas PC and PR cues both increased the accuracy of visual discrimination, value-driven acceleration of reaction times (RTs) and pupillary responses only occurred for PC cues. The modulation of pupil size by high reward PC cues was strongly correlated with the modulation of a combined measure of speed and accuracy. These results indicate that although value-driven modulation of perception can occur even when reward delivery is halted, stronger goal-driven control elicited by PC reward cues additionally results in a more efficient balance between accuracy and speed of perceptual choices.

show abstract

Section: Discussionsupporting

confidence: 88%

Value-driven modulation of visual perception by visual and auditory reward cues: The role of performance-contingent delivery of reward

Antono

Vakhrushev

Pooresmaeili

2022

Front. Hum. Neurosci.

View full text Add to dashboard Cite

show abstract

“…The hypothesized relationship between NE and random exploration is in line with previous work showing that pupil size -as an indirect measure of NE level (4,13) -correlates with variability in the evidence accumulation process (38,39), magnitude of noise in perceptual tasks (40), and choice randomness in value-based decision making (9,(41)(42)(43)(44). These pupillometry studies suggest that pupil size could represent the computation of total uncertainty during reinforcement learning, though this has not been directly tested yet.…”

supporting

confidence: 81%

Pupil size encodes uncertainty during exploration

Fan¹,

Burke

Sambrano³

et al. 2023

Preprint

View full text Add to dashboard Cite

Exploration is an important part of decision making and is crucial to maximizing long-term reward. Past work has shown that people use different forms of uncertainty to guide exploration. In this study, we investigate the role of the pupil-linked arousal system in uncertainty-guided exploration. We measured participants’ pupil dilation (N = 48) while they performed a two- armed bandit task. Consistent with previous work, we found that people adopted a hybrid of directed, random and undirected exploration, which are sensitive to relative uncertainty, total uncertainty and value difference between options, respectively. We also found a positive correlation between pupil size and total uncertainty. Furthermore, augmenting the choice model with subject-specific total uncertainty estimates decoded from the pupil size improved predictions of held-out choices, suggesting that people used the uncertainty estimate encoded in pupil size to decide which option to explore Together, the data shed light on the computations underlying uncertainty-driven exploration. Under the assumption that pupil size reflects Locus Coeruleus-Norepinephrine (LC-NE) neuromodulatory activity, these results also extend the theory of LC-NE function in exploration, highlighting its selective role in driving uncertainty- guided random exploration.

show abstract

“…This conflict arises between at least two simultaneously active competing internal models, or 'task sets' (Domenech et al, 2020;Koechlin, 2020) one being a predominant response tendency (exploitation), and the otherits conscious alternative (exploration). Our recent pupillometric study lends support to this assumption (Kozunova et al, 2022): we found that such explorative choices compared to exploitative ones are accompanied by larger pupil dilation and longer decision time. We speculated that this state of conflict supposedly entails an increase in the degree of processing required to make the deliberately explorative decisions.…”

Section: Introductionsupporting

confidence: 66%

“…A recent pupillometric study (Kozunova et al, 2022) revealed that advantageous choices that immediately preceded and immediately followed explorative choices significantly differed from the advantageous choices committed within the periods of continuous exploitation. This finding hints that the internal state related to exploration modulates brain activity on a scale longer than duration of one trial.…”

Section: Methodsmentioning

confidence: 99%

“…outcome of the previous trial (two levels: ‘previous loss’ and ‘previous gain’) and their interactions as the fixed effects, and Subject as a random factor: where Response time is log-transformed RT (time from stimulus onset to button press originally measured in milliseconds). We included the Previous feedback factor into the LMM model because it could affect the response time via interaction with Choice type in the same probability task (Kozunova et al, 2022).…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Losses resulting from deliberate exploration trigger beta oscillations in frontal cortex

Владимирович

Pultsina

Tretyakova

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

We examined the neural signature of directed exploration by contrasting MEG beta(16-30 Hz) power changes between disadvantageous and advantageous choices in the two-choice probabilistic reward task. Both types of choices were made when our participants learned the probabilistic contingency between choices and their outcomes, i.e., acquired the inner model of choice value. Therefore, rare disadvantageous choices might serve exploratory, environment-probing purposes. The study brought two main findings. Firstly, decision making leading to disadvantageous choices took more time and evidenced greater large-scale suppression of beta oscillations than its advantageous alternative. Additional neural resources required by disadvantageous decisions strongly suggest their deliberately explorative nature. Secondly, an outcome of disadvantageous and advantageous choices had qualitatively different impact on feedback-related beta oscillations. Only losses, but not gains, resulting from the disadvantageous choice were followed by late beta synchronization in frontal cortex. Our results are consistent with the role of frontal beta oscillations in the stabilization of neural representations for selected behavioral rule when exploratory strategy conflicts with value-based behavior. Punishment for exploratory choice being congruent with its low value in the reward history is more likely to strengthen, through punishment-related beta oscillations, the representation of its competitor - the inner utility model.

show abstract

Pupil dilation and response slowing distinguish deliberate explorative choices in the probabilistic learning task

Cited by 10 publications

References 76 publications

Value-driven modulation of visual perception by visual and auditory reward cues: The role of performance-contingent delivery of reward

Value-driven modulation of visual perception by visual and auditory reward cues: The role of performance-contingent delivery of reward

Pupil size encodes uncertainty during exploration

Losses resulting from deliberate exploration trigger beta oscillations in frontal cortex

Contact Info

Product

Resources

About