Learning to Avoid Risky Actions

Malfáz, María; Salichs, Miguel Á.

doi:10.1080/01969722.2011.634681

Cited by 4 publications

(4 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Maggie is a social and personal robot intended to perform research on HRI and improving robots autonomy (Figure 1). It is controlled by the Automatic-Deliberative architecture (Barber and Salichs 2002;Barber 2000;Barber and Salichs 2001;Rivas, Corrales, Barber, and Salichs 2007;Malfaz and Salichs 2011) where the elemental component is the skill. Skills endow the robot with different sensory and motor capacities, and process information.…”

Section: The Robot Maggie and Its Decision Making Systemmentioning

confidence: 99%

See 1 more Smart Citation

Learning Behaviors by an Autonomous Social Robot With Motivations

Castro‐González

Malfáz

Gorostiza

et al. 2014

Cybernetics and Systems

Self Cite

View full text Add to dashboard Cite

In this paper an autonomous social robot is living in a laboratory where it can interact with several items (people included). Its goal is to learn by itself the proper behaviors in order to maintain its wellbeing as high as possible.Several experiments have been conducted to test the performance of the system.The Object Q-Learning algorithm has been implemented in the robot as the learning algorithm. This algorithm is a variation of the traditional Q-Learning since it considers a reduced state space and collateral effects. The comparison of the performance of both algorithms is shown in the f rst part of the experiments. Moreover, two mechanisms intended to reduce the learning session durations have been included: Well-Balanced Exploration and Amplif ed Reward. Their advantages are justif ed in the results obtained in the second part of the experiments.Finally, the behaviors learned by our robot are analyzed. The resulting behaviors have not been pre-programmed.In fact, they have been learned by real interaction in the real world, and are related to the motivations of the robot.These are natural behaviors in the sense that they can be easily understood by humans observing the robot.

show abstract

Section: The Robot Maggie and Its Decision Making Systemmentioning

confidence: 99%

“…In this approach, the external state considers each object separately (Castro-González, Malfaz, and Salichs 2011).…”

Section: A the Reduced State Spacementioning

confidence: 99%

Learning Behaviors by an Autonomous Social Robot With Motivations

Castro‐González

Malfáz

Gorostiza

et al. 2014

Cybernetics and Systems

Self Cite

View full text Add to dashboard Cite

show abstract

“…Our work is also related to safe reinforcement learning (García and Fernández 2015), which also aims to identify risky states. Many such methods aim to optimise a riskaverse objective (Bertsekas and Rhodes 1971;Heger 1994;Malfaz and Salichs 2011), whereas we aim to more efficiently optimise a risk-neutral objective (expected return). Other methods aim to constrain exploration to avoid risky states (Gehring and Precup 2013), whereas we learn in a safe simulator and thus seek proposal distributions that visit such states more often, if they are significant to expected return.…”

Section: Related Workmentioning

confidence: 99%

OFFER: Off-Environment Reinforcement Learning

Ciosek

Whiteson

2017

AAAI

View full text Add to dashboard Cite

Policy gradient methods have been widely applied in reinforcement learning. For reasons of safety and cost, learning is often conducted using a simulator. However, learning in simulation does not traditionally utilise the opportunity to improve learning by adjusting certain environment variables - state features that are randomly determined by the environment in a physical setting but controllable in a simulator. Exploiting environment variables is crucial in domains containing significant rare events (SREs), e.g., unusual wind conditions that can crash a helicopter, which are rarely observed under random sampling but have a considerable impact on expected return. We propose off environment reinforcement learning (OFFER), which addresses such cases by simultaneously optimising the policy and a proposal distribution over environment variables. We prove that OFFER converges to a locally optimal policy and show experimentally that it learns better and faster than a policy gradient baseline.

show abstract

“…you are afraid of walking a tightrope). Risky actions have already been studied in virtual agents [35] and they will be considered in our robot in future works.…”

Section: General Aspects Of Fearmentioning

confidence: 99%