We examined the neural signature of directed exploration by contrasting MEG beta (16–30 Hz) power changes between disadvantageous and advantageous choices in the two-choice probabilistic reward task. We analyzed the choices made after the participants have learned the probabilistic contingency between choices and their outcomes, i.e., acquired the inner model of choice values. Therefore, rare disadvantageous choices might serve explorative, environment-probing purposes. The study brought two main findings. Firstly, decision making leading to disadvantageous choices took more time and evidenced greater large-scale suppression of beta oscillations than its advantageous alternative. Additional neural resources recruited during disadvantageous decisions strongly suggest their deliberately explorative nature. Secondly, an outcome of disadvantageous and advantageous choices had qualitatively different impact on feedback-related beta oscillations. After the disadvantageous choices, only losses—but not gains—were followed by late beta synchronization in frontal cortex. Our results are consistent with the role of frontal beta oscillations in the stabilization of neural representations for selected behavioral rule when explorative strategy conflicts with value-based behavior. Punishment for explorative choice being congruent with its low value in the reward history is more likely to strengthen, through punishment-related beta oscillations, the representation of exploitative choices consistent with the inner utility model.
Large-scale cortical beta (β) oscillations were implicated in the learning processes, but their exact role is debated. We used MEG to explore the dynamics of movement-related βoscillations while 22 adults learned, through trial and error, novel associations between four auditory pseudowords and movements of four limbs. As learning proceeded, spatial-temporal characteristics of βoscillations accompanying cue-triggered movements underwent a major transition. Early in learning, widespread suppression of βpower occurred long before movement initiation and sustained throughout the whole behavioral trial. When learning advanced and performance reached asymptote, βsuppression after the initiation of correct motor response was replaced by a rise in βpower mainly in the prefrontal and medial temporal regions of the left hemisphere. This post-decision βpower predicted trial-by-trial response times (RT) at both stages of learning (before and after the rules become familiar), but with different signs of interaction. When a subject just started to acquire associative rules and gradually improved task performance, a decrease in RT correlated with the increase in the post-decision βband power. When the participants implemented the already acquired rules, faster (more confident) responses were associated with the weaker post-decision βband synchronization. Our findings suggest that maximal beta activity is pertinent to a distinct stage of learning and may serve to strengthen the newly learned association in a distributed memory network.
We examined the neural signature of directed exploration by contrasting MEG beta(16-30 Hz) power changes between disadvantageous and advantageous choices in the two-choice probabilistic reward task. Both types of choices were made when our participants learned the probabilistic contingency between choices and their outcomes, i.e., acquired the inner model of choice value. Therefore, rare disadvantageous choices might serve exploratory, environment-probing purposes. The study brought two main findings. Firstly, decision making leading to disadvantageous choices took more time and evidenced greater large-scale suppression of beta oscillations than its advantageous alternative. Additional neural resources required by disadvantageous decisions strongly suggest their deliberately explorative nature. Secondly, an outcome of disadvantageous and advantageous choices had qualitatively different impact on feedback-related beta oscillations. Only losses, but not gains, resulting from the disadvantageous choice were followed by late beta synchronization in frontal cortex. Our results are consistent with the role of frontal beta oscillations in the stabilization of neural representations for selected behavioral rule when exploratory strategy conflicts with value-based behavior. Punishment for exploratory choice being congruent with its low value in the reward history is more likely to strengthen, through punishment-related beta oscillations, the representation of its competitor - the inner utility model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.