Wheel Loader Scooping Controller Using Deep Reinforcement Learning

Azulay, Osher; Shapiro, Amir

doi:10.1109/access.2021.3056625

Cited by 38 publications

(14 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In future field tests, the amount of spillage on the ground and the slipping of the wheels should also be measured and taken into account when comparing the overall performance. Comparison with [21] and [18] is interesting but difficult because of the large differences in the vehicles' size and strength and the materials' properties. The reinforcement learning controller in [21] achieved a fill factor of 65%, and the energy consumption was not measured.…”

Section: Resultsmentioning

confidence: 99%

“…Comparison with [21] and [18] is interesting but difficult because of the large differences in the vehicles' size and strength and the materials' properties. The reinforcement learning controller in [21] achieved a fill factor of 65%, and the energy consumption was not measured. The neural network controller in [18], which was trained by learning from demonstration, reached 81% of the filling of the bucket relative to manual loading, but neither loading time nor work was reported.…”

Section: Resultsmentioning

confidence: 99%

“…With the adaptation algorithm in [20], the network adapts from loading mediumcoarse gravel to cobble gravel, with a five-to ten-percent increase in bucket filling after 40 loadings. The first use of reinforcement learning to control a scooping mechanism was recently published [21]. Using the actor-critic algorithm and the deep deterministic policy gradient algorithm, an agent was trained to control a three-degree-of-freedom mechanism in order to fill a bucket.…”

Section: Related Work and Our Contributionmentioning

confidence: 99%

“…Using the actor-critic algorithm and the deep deterministic policy gradient algorithm, an agent was trained to control a three-degree-of-freedom mechanism in order to fill a bucket. The authors of [21] did not consider the use of high-dimensional observation data in order to adapt to variable pile shapes or to steer the vehicle. Reinforcement learning agents are often trained in simulated environments, as they are an economical and safe way to produce large amounts of labeled data [22] before transferring to real environments.…”

Section: Related Work and Our Contributionmentioning

confidence: 99%

See 3 more Smart Citations

Continuous Control of an Underground Loader Using Deep Reinforcement Learning

et al. 2021

View full text Add to dashboard Cite

The reinforcement learning control of an underground loader was investigated in a simulated environment by using a multi-agent deep neural network approach. At the start of each loading cycle, one agent selects the dig position from a depth camera image of a pile of fragmented rock. A second agent is responsible for continuous control of the vehicle, with the goal of filling the bucket at the selected loading point while avoiding collisions, getting stuck, or losing ground traction. This relies on motion and force sensors, as well as on a camera and lidar. Using a soft actor–critic algorithm, the agents learn policies for efficient bucket filling over many subsequent loading cycles, with a clear ability to adapt to the changing environment. The best results—on average, 75% of the max capacity—were obtained when including a penalty for energy usage in the reward.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Related Work and Our Contributionmentioning

confidence: 99%

Section: Related Work and Our Contributionmentioning

confidence: 99%

See 2 more Smart Citations

Continuous Control of an Underground Loader Using Deep Reinforcement Learning

et al. 2021

View full text Add to dashboard Cite

show abstract

“…At each time-step, the agent takes action a according to the current environmental state S t and the policy π which is a mapping from perceived states to actions. Therefore, as a consequence of action, the environmental state transits from S t to S t+1 and the agent gets a reward r. The agent and environment generate the trajectories (S 1 ; A 1 ; R 1 ), (S 2 ; A 2 ; R 2 ), ..., (S T ; A T ; R T ) [23], until an episode is over. The basic architecture of RL is shown in Figure 8.…”

Section: Automatic Bucket-filling Algorithm Based On Q-learningmentioning

confidence: 99%

Data-Driven Reinforcement-Learning-Based Automatic Bucket-Filling for Wheel Loaders

et al. 2021

View full text Add to dashboard Cite

Automation of bucket-filling is of crucial significance to the fully automated systems for wheel loaders. Most previous works are based on a physical model, which cannot adapt to the changeable and complicated working environment. Thus, in this paper, a data-driven reinforcement-learning (RL)-based approach is proposed to achieve automatic bucket-filling. An automatic bucket-filling algorithm based on Q-learning is developed to enhance the adaptability of the autonomous scooping system. A nonlinear, non-parametric statistical model is also built to approximate the real working environment using the actual data obtained from tests. The statistical model is used for predicting the state of wheel loaders in the bucket-filling process. Then, the proposed algorithm is trained on the prediction model. Finally, the results of the training confirm that the proposed algorithm has good performance in adaptability, convergence, and fuel consumption in the absence of a physical model. The results also demonstrate the transfer learning capability of the proposed approach. The proposed method can be applied to different machine-pile environments.

show abstract