Simultaneous on-line Discovery and Improvement of Robotic Skill options

Stulp, Freek; Herlant, Laura V.; Hoarau, Antoine; Raiola, Gennaro

doi:10.1109/iros.2014.6942741

Cited by 9 publications

(9 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Baranes and Oudeyer (Baranes & Oudeyer, ) have studied the efficiency of combining stochastic optimization to reach goals with maturational mechanisms which progressively grow the limits within which stochastic optimization can physically explore, showing an increase in efficiency from a machine learning point of view. Several works have shown how human demonstration of movements could bootstrap this optimization process (e.g., Stulp, Herlant, Hoarau, & Raiola, ), or how humans can progressively shape subparts of the movements to complement autonomous exploration (Chernova & Thomaz, ). Finally, exploration in infants is also highly driven by mechanisms of intrinsic motivation (also called curiosity), where instead of trying to reach a goal imposed by social peers or the experimenter (as in the model presented in this paper), they use intrinsic criteria such as information gain or surprise to set their own goals and choose how to practice these self‐selected goals (Gottlieb, Oudeyer, Lopes, & Baranes, ; Moulin‐Frier, Nguyen, & Oudeyer, ).…”

Section: General Discussion and Conclusionmentioning

confidence: 99%

“…Baranes and Oudeyer (Baranes & Oudeyer, 2011) have studied the efficiency of combining stochastic optimization to reach goals with maturational mechanisms which progressively grow the limits within which stochastic optimization can physically explore, showing an increase in efficiency from a machine learning point of view. Several works have shown how human demonstration of movements could bootstrap this optimization process (e.g., Stulp, Herlant, Hoarau, & Raiola, 2014), or how humans can progressively shape subparts of the movements to complement autonomous exploration (Chernova & Thomaz, 2014).…”

Section: Complementarity With Other Mechanismsmentioning

confidence: 99%

See 1 more Smart Citation

Proximodistal exploration in motor learning as an emergent property of optimization

Stulp

Oudeyer

2017

Developmental Science

Self Cite

View full text Add to dashboard Cite

To harness the complexity of their high-dimensional bodies during sensorimotor development, infants are guided by patterns of freezing and freeing of degrees of freedom. For instance, when learning to reach, infants free the degrees of freedom in their arm proximodistally, that is, from joints that are closer to the body to those that are more distant. Here, we formulate and study computationally the hypothesis that such patterns can emerge spontaneously as the result of a family of stochastic optimization processes, without an innate encoding of a maturational schedule. In particular, we present simulated experiments with an arm where a computational learner progressively acquires reaching skills through adaptive exploration, and we show that a proximodistal organization appears spontaneously, which we denote PDFF (Proximo Distal Freezing and Freeing of degrees of freedom). We also compare this emergent organization between different arm morphologies-from human-like to quite unnatural ones-to study the effect of different kinematic structures on the emergence of PDFF.

show abstract

Section: General Discussion and Conclusionmentioning

confidence: 99%

Section: Complementarity With Other Mechanismsmentioning

confidence: 99%

Proximodistal exploration in motor learning as an emergent property of optimization

Stulp

Oudeyer

2017

Developmental Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…Direct policy search is a form of reinforcement learning in which the search for the optimal policy is done directly in the space of the parameters θ of a parameterized policy π θ , rather than using a value function. The specific algorithm we use is PI BB (Policy Improvement through Black-Box optimization [22]). Since any model-free direct policy search algorithm could be used to implement this optimization (e.g.…”

Section: Optimization Algorithm: Direct Policy Searchmentioning

confidence: 99%

“…Despite its simplicity, PI BB is able to learn robot skills efficiently and robustly [22]. Alternatively, algorithms such as PIˆ2, PoWER, NES, PGPE, or CMA-ES could be used, see [23,11] for an overview and comparisons.…”

Section: A Policy Improvement Through Black-box Optimizationmentioning

confidence: 99%

Learning Legible Motion from Human–Robot Interactions

Busch

Grizou

Lopes

et al. 2017

Int J of Soc Robotics

Self Cite

View full text Add to dashboard Cite

In collaborative tasks, displaying legible behavior enables other members of the team to anticipate intentions and to thus coordinate their actions accordingly. Behavior is therefore considered to be legible when an observer is able to quickly and correctly infer the intention of the agent generating the behavior. In previous work, legible robot behavior has been generated by using model-based methods to optimize task-specific models of legibility. In our work, we rather use model-free reinforcement learning with a generic, task-independent cost function. In the context of experiments involving a joint task between (thirty) human subjects and a humanoid robot, we show that: 1) legible behavior arises when rewarding the efficiency of joint task completion during human-robot interactions 2) behavior that has been optimized for one subject is also more legible for other subjects 3) the universal legibility of behavior is influenced by the choice of the policy representation.

show abstract

“…Next to approaches that have considered finite sets of parameterized problems [4], [12], other approaches [7], [8], [9], [10], [13] have considered the challenge of autonomous exploration and learning of continuous fields of parameterized problems (e.g. discovering and learning all the feasible displacements of objects and their motor solutions).…”

Section: Introductionmentioning

confidence: 99%

Modular active curiosity-driven discovery of tool use

Forestier

Oudeyer

2016

2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

View full text Add to dashboard Cite

This article studies algorithms used by a learner to explore high-dimensional structured sensorimotor spaces such as in tool use discovery. In particular, we consider goal babbling architectures that were designed to explore and learn solutions to fields of sensorimotor problems, i.e. to acquire inverse models mapping a space of parameterized sensorimotor problems/effects to a corresponding space of parameterized motor primitives. However, so far these architectures have not been used in high-dimensional spaces of effects. Here, we show the limits of existing goal babbling architectures for efficient exploration in such spaces, and introduce a novel exploration architecture called Model Babbling (MB). MB exploits efficiently a modular representation of the space of parameterized problems/effects. We also study an active version of Model Babbling (the MACOB architecture). These architectures are compared in a simulated experimental setup with an arm that can discover and learn how to move objects using two tools with different properties, embedding structured high-dimensional continuous motor and sensory spaces.

show abstract

Simultaneous on-line Discovery and Improvement of Robotic Skill options

Cited by 9 publications

References 10 publications

Proximodistal exploration in motor learning as an emergent property of optimization

Proximodistal exploration in motor learning as an emergent property of optimization

Learning Legible Motion from Human–Robot Interactions

Modular active curiosity-driven discovery of tool use

Contact Info

Product

Resources

About