Comparison of reinforcement learning algorithms applied to the cart-pole problem

Nagendra, S.; Podila, Nikhil; Ugarakhod, Rashmi; George, Koshy

doi:10.1109/icacci.2017.8125811

Cited by 29 publications

(16 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The pole is free to spin on a pivot on the vertical axis of the cart and the track. The controller applies a force F to the right or left of the cart; this allows the cart to move and keep the pole balanced [58]. The force is bounded by the interval (−F max , F max ), where F max is a system parameter.…”

Section: A Cart-pole Balancingmentioning

confidence: 99%

See 1 more Smart Citation

A Robust Approach for Continuous Interactive Actor-Critic Algorithms

et al. 2021

View full text Add to dashboard Cite

Reinforcement learning refers to a machine learning paradigm in which an agent interacts with the environment to learn how to perform a task. The characteristics of the environment may change over time or be affected by disturbances not controlled, avoiding the agent finding a proper policy. Some approaches attempt to address these problems, as interactive reinforcement learning, where an external entity helps the agent learn through advice. Other approaches, such as robust reinforcement learning, allow the agent to learn the task, acting in a disturbed environment. In this paper, we propose an approach that addresses interactive reinforcement learning problems in a dynamic environment, where advice provides information on the task and the dynamics of the environment. Thus, an agent learns a policy in a disturbed environment while receiving advice. We implement our approach in the dynamic version of the cart-pole balancing task and a simulated robotic arm dynamic environment to organize objects. Our results show that the proposed approach allows an agent to complete the task satisfactorily in a dynamic, continuous stateaction domain. Moreover, experimental results suggest agents trained with our approach are less sensitive to changes in the characteristics of the environment than interactive reinforcement learning agents.INDEX TERMS Continuous interactive reinforcement learning, interactive robust reinforcement learning, reinforcement learning, robust reinforcement learning.

show abstract

Section: A Cart-pole Balancingmentioning

confidence: 99%

“…The applied force F produces a linear movement on the cart and an angular movement on the pole. Figure adapted from Nagendra et al[58].…”

mentioning

confidence: 99%

A Robust Approach for Continuous Interactive Actor-Critic Algorithms

et al. 2021

View full text Add to dashboard Cite

show abstract

“…The car moves back and forth along a no friction track with the task of balancing a pole attached by an un-actuated joint as shown in Figure 3. The goal is to learn how to swing and balance the pole just by moving the car around the track (Barto et al, 1983;Nagendra et al, 2017). The observation variable, as in inverted pendulum swing-up environment, considers derived information of the angle, below has more details of training and variables characteristics in Table 4.…”

Section: Cart Pole (Cp)mentioning

confidence: 99%

Comparing Action Aggregation Strategies in Deep Reinforcement Learning with Continuous Action

Oliveira

Caarls

2020

Anais Do Congresso Brasileiro De Automática 2020

View full text Add to dashboard Cite

Deep Reinforcement Learning has been very promising in learning continuous control policies. For complex tasks, Reinforcement Learning with minimal human intervention is still a challenge. This article proposes a study to improve performance and to stabilize the learning curve using the ensemble learning methods. Learning a combined parameterized action function using multiple agents in a single environment, while searching for a better way to learn, regardless of the quality of the parametrization. The action ensemble methods were applied in three environments: pendulum swing-up, cart pole and half cheetah. Their results demonstrated that action ensemble can improve performance with respect to the grid search technique. This article also presents as contribution the comparison of the effectiveness of the aggregation techniques, the analysis considers the use of the separate or the combined policies during training. The latter presents better learning results when used with the data center aggregation strategy.

show abstract

“…The most recently visited state Mathematically, this function is dened, for each state s ∈ S, by E 0 (s) = 0 and : (14) γ and λ are parameters that make the eligibility trace decrease over the time steps. When a state is visited, 1(S t = s) = 1 and the function increases.…”

Section: Engg7282mentioning

confidence: 99%

“…Robotics problems are usually represented by continuous state spaces with discrete or continuous actions. The following papers explain how to deal with such complex representations using algorithms like: Q-Learning, Actor Critic Policy Gradient and SARSA for problems like Cart-Pole [14], ball collecting tasks [15] [17]. [18] [19], Proximal Policy Optimization [18] or Monotonic Policy Optimization [4] used to control the motion of 3D robots.…”

Section: Roboticsmentioning

confidence: 99%

UQ eSpace

Augot¹

View full text Add to dashboard Cite

This thesis report explains how to control the movement of an underactuated robot using Reinforcement Learning (RL). It presents the theory of RL, the computer model of the robot, the architecture of the code and the two approaches followed to solve the problem. The simulations are done using MuJoCo in a python environment. They simulate the behaviour of the robot in two dierent state spaces: the rst experiments take place in a discrete state space, where the learning is done using Q-Learning and SARSA(λ) algorithms. The second approach is to treat the problem in a continuous state space and implement REINFORCE. With such algorithms, it is demonstrated that the initial chaotic behaviour of the robot can be controlled to move accurately in a straight direction. It manages to move slightly faster in the continuous state space, but with more variance. At the end of this report, some suggestions are proposed to go further and improve the learning process.

show abstract

Comparison of reinforcement learning algorithms applied to the cart-pole problem

Cited by 29 publications

References 11 publications

A Robust Approach for Continuous Interactive Actor-Critic Algorithms

A Robust Approach for Continuous Interactive Actor-Critic Algorithms

Comparing Action Aggregation Strategies in Deep Reinforcement Learning with Continuous Action

UQ eSpace

Contact Info

Product

Resources

About