An intrinsic value system for developing multiple invariant representations with incremental slowness learning

Luciw, Matthew; Kompella, Varun Raj; Kazerounian, Sohrob; Schmidhuber, Jürgen

doi:10.3389/fnbot.2013.00009

Cited by 24 publications

(16 citation statements)

References 67 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In other words, a valuable policy maximizes the entropy over the final states and the expected utility over those states. This decomposition of the value of a policy fits neatly with accounts of intrinsic and extrinsic reward ( Luciw et al 2013 ) and connects to classic notions of exploration and exploitation ( Cohen et al 2007 ; Daw 2009 ). Here, increasing the entropy over goal states corresponds to the concept of a novelty bonus ( Kakade and Dayan 2002 ) or information gain, whereas maximizing expected utility corresponds to exploitation.…”

Section: Methodsmentioning

confidence: 68%

The Dopaminergic Midbrain Encodes the Expected Certainty about Desired Outcomes

et al. 2014

View full text Add to dashboard Cite

Dopamine plays a key role in learning; however, its exact function in decision making and choice remains unclear. Recently, we proposed a generic model based on active (Bayesian) inference wherein dopamine encodes the precision of beliefs about optimal policies. Put simply, dopamine discharges reflect the confidence that a chosen policy will lead to desired outcomes. We designed a novel task to test this hypothesis, where subjects played a “limited offer” game in a functional magnetic resonance imaging experiment. Subjects had to decide how long to wait for a high offer before accepting a low offer, with the risk of losing everything if they waited too long. Bayesian model comparison showed that behavior strongly supported active inference, based on surprise minimization, over classical utility maximization schemes. Furthermore, midbrain activity, encompassing dopamine projection neurons, was accurately predicted by trial-by-trial variations in model-based estimates of precision. Our findings demonstrate that human subjects infer both optimal policies and the precision of those inferences, and thus support the notion that humans perform hierarchical probabilistic Bayesian inference. In other words, subjects have to infer both what they should do as well as how confident they are in their choices, where confidence may be encoded by dopaminergic firing.

show abstract

Section: Methodsmentioning

confidence: 68%

The Dopaminergic Midbrain Encodes the Expected Certainty about Desired Outcomes

et al. 2014

View full text Add to dashboard Cite

show abstract

“…An important difference is, however, that exploration is often equated with random or stochastic behavior in reinforcement learning schemes (but see Thrun, 1992), whereas in our framework, maximizing entropy over outcome states is a goal-driven, purposeful process—with the aim of accessing allowable states. Furthermore, this distinction neatly reflects the differentiation between intrinsic and extrinsic reward (Schmidhuber, 1991, 2009; Luciw et al, 2013), where extrinsic reward refers to externally administered reinforcement—corresponding to maximizing expected utility—and intrinsic reward is associated with maximizing entropy over outcomes. Maximizing intrinsic reward is usually associated with seeking new experiences in order to increase context-sensitive learning—which is reflected as increasing model-evidence or minimizing surprise in the active inference framework.…”

Section: Boredom and Novelty Seeking Under The Free Energy Principlementioning

confidence: 99%

Exploration, novelty, surprise, and free energy minimization

et al. 2013

View full text Add to dashboard Cite

This paper reviews recent developments under the free energy principle that introduce a normative perspective on classical economic (utilitarian) decision-making based on (active) Bayesian inference. It has been suggested that the free energy principle precludes novelty and complexity, because it assumes that biological systems—like ourselves—try to minimize the long-term average of surprise to maintain their homeostasis. However, recent formulations show that minimizing surprise leads naturally to concepts such as exploration and novelty bonuses. In this approach, agents infer a policy that minimizes surprise by minimizing the difference (or relative entropy) between likely and desired outcomes, which involves both pursuing the goal-state that has the highest expected utility (often termed “exploitation”) and visiting a number of different goal-states (“exploration”). Crucially, the opportunity to visit new states increases the value of the current state. Casting decision-making problems within a variational framework, therefore, predicts that our behavior is governed by both the entropy and expected utility of future states. This dissolves any dialectic between minimizing surprise and exploration or novelty seeking.

show abstract

“…33,36 When dealing with real-time video sequences, the limit or length of data is not known a priori, therefore, an incremental algorithm is needed. A various incremental version of SFA was proposed in the literature such as by Luciw et al 34,35 which learns the estimation of the features extracted resulting loss of accuracy.…”

Section: Motion Primitive Segmentationmentioning

confidence: 99%

Developmental Approach for Behavior Learning Using Primitive Motion Skills

Dawood

Loo

2018

Int. J. Neur. Syst.

View full text Add to dashboard Cite

Imitation learning through self-exploration is essential in developing sensorimotor skills. Most developmental theories emphasize that social interactions, especially understanding of observed actions, could be first achieved through imitation, yet the discussion on the origin of primitive imitative abilities is often neglected, referring instead to the possibility of its innateness. This paper presents a developmental model of imitation learning based on the hypothesis that humanoid robot acquires imitative abilities as induced by sensorimotor associative learning through self-exploration. In designing such learning system, several key issues will be addressed: automatic segmentation of the observed actions into motion primitives using raw images acquired from the camera without requiring any kinematic model; incremental learning of spatio-temporal motion sequences to dynamically generates a topological structure in a self-stabilizing manner; organization of the learned data for easy and efficient retrieval using a dynamic associative memory; and utilizing segmented motion primitives to generate complex behavior by the combining these motion primitives. In our experiment, the self-posture is acquired through observing the image of its own body posture while performing the action in front of a mirror through body babbling. The complete architecture was evaluated by simulation and real robot experiments performed on DARwIn-OP humanoid robot.

show abstract

An intrinsic value system for developing multiple invariant representations with incremental slowness learning

Cited by 24 publications

References 67 publications

The Dopaminergic Midbrain Encodes the Expected Certainty about Desired Outcomes

The Dopaminergic Midbrain Encodes the Expected Certainty about Desired Outcomes

Exploration, novelty, surprise, and free energy minimization

Developmental Approach for Behavior Learning Using Primitive Motion Skills

Contact Info

Product

Resources

About