Training Unity Machine Learning Agents using reinforcement learning method

Urmanov, Marat; Alimanova, Madina; Nurkey, Askar

doi:10.1109/icecco48375.2019.9043194

Cited by 10 publications

(5 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…To handle the increased complexity of the input space of the considered problems, we leverage neural network regression to model our PBO. We consider an offline setting, where we use ProFQI on car-on-hill (Ernst, Geurts, and Wehenkel 2005), and an online setting, where we use ProDQN on bicycle balancing (Randlov and Alstrøm 1998), and lunar lander (Brockman et al 2016). We want to answer the following research question: does PBO enable moving toward the fixed point more effectively than the empirical Bellman operator?…”

Section: Methodsmentioning

confidence: 99%

“…We also evaluate PBO in an online setting, using our ProDQN algorithm and comparing against DQN (Mnih et al 2015). We consider a bicycle balancing (Randlov and Alstrøm 1998) problem and the lunar lander environment (Brockman et al 2016). We set the number of Bellman iterations K = 8 for bicycle balancing and K = 10 for the lunar lander.…”

Section: Projected Deep Q-networkmentioning

confidence: 99%

See 1 more Smart Citation

Parameterized Projected Bellman Operator

Vincent,

Metelli,

Belousov

et al. 2024

AAAI

View full text Add to dashboard Cite

Approximate value iteration (AVI) is a family of algorithms for reinforcement learning (RL) that aims to obtain an approximation of the optimal value function. Generally, AVI algorithms implement an iterated procedure where each step consists of (i) an application of the Bellman operator and (ii) a projection step into a considered function space. Notoriously, the Bellman operator leverages transition samples, which strongly determine its behavior, as uninformative samples can result in negligible updates or long detours, whose detrimental effects are further exacerbated by the computationally intensive projection step. To address these issues, we propose a novel alternative approach based on learning an approximate version of the Bellman operator rather than estimating it through samples as in AVI approaches. This way, we are able to (i) generalize across transition samples and (ii) avoid the computationally intensive projection step. For this reason, we call our novel operator projected Bellman operator (PBO). We formulate an optimization problem to learn PBO for generic sequential decision-making problems, and we theoretically analyze its properties in two representative classes of RL problems. Furthermore, we theoretically study our approach under the lens of AVI and devise algorithmic implementations to learn PBO in offline and online settings by leveraging neural network parameterizations. Finally, we empirically showcase the benefits of PBO w.r.t. the regular Bellman operator on several RL problems.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Projected Deep Q-networkmentioning

confidence: 99%

Parameterized Projected Bellman Operator

Vincent,

Metelli,

Belousov

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…The robotic arm's reinforcement learning training is put into practice using Unity ML-Agents. In the Unity reinforcement learning tool, the action performer is referred to as an agent, who is merged into the environment; the strategy is the objective of the action execution; the brain is in control of providing associated agents decision-making strategies to guide the execution of the action [22]; Before and after the action is taken, there will be two states, and the difference between the 2 states will produce a reward value that meets the conditions of the strategy, information exchange and general instruction. Figure 3 depicts the link between "Agent," "Brain," and "Academy": Our digital twin training environment is built based on Anaconda Navigator, the officially recommended plugin for deep learning training on objects.…”

Section: Training Environment For Reinforcement Learningmentioning

confidence: 99%

Fruit Picking Robot Arm Training Solution Based on Reinforcement Learning in Digital Twin

Tian,

Pan,

Bai

et al. 2023

JICTS

View full text Add to dashboard Cite

In the era of Industry 4.0, digital agriculture is developing very rapidly and has achieved considerable results. Nowadays, digital agriculture-based research is more focused on the use of robotic fruit picking technology, and the main research direction of such topics is algorithms for computer vision. However, when computer vision algorithms successfully locate the target object, it is still necessary to use robotic arm movement to reach the object at the physical level, but such path planning has received minimal attention. Based on this research deficiency, we propose to use Unity software as a digital twin platform to plan the robotic arm path and use ML-Agent plug-in as a reinforcement learning means to train the robotic arm path, to improve the accuracy of the robotic arm to reach the fruit, and happily the effect of this method is much improved than the traditional method.

show abstract

“…Nie podjęto jednak próby analizy tych rezultatów. Unity wykorzystywano również do treningu agentów w innych, indywidualnych projektach środowisk [8]. Jednak nie zostały one szczegółowo opisane, ani nie wskazano zastosowanych algorytmów uczenia.…”

Section: Wstępunclassified

Analysis of the possibilities for using machine learning algorithms in the Unity environment

Litwynenko¹,

Plechawska–Wójcik

2021

jcsi

View full text Add to dashboard Cite

Reinforcement learning algorithms are gaining popularity, and their advancement is made possible by the presence of tools to evaluate them. This paper concerns the applicability of machine learning algorithms on the Unity platform using the Unity ML-Agents Toolkit library. The purpose of the study was to compare two algorithms: Proximal Policy Optimization and Soft Actor-Critic. The possibility of improving the learning results by combining these algorithms with Generative Adversarial Imitation Learning was also verified. The results of the study showed that the PPO algorithm can perform better in uncomplicated environments with non-immediate rewards, while the additional use of GAIL can improve learning performance.

show abstract

Training Unity Machine Learning Agents using reinforcement learning method

Cited by 10 publications

References 3 publications

Parameterized Projected Bellman Operator

Parameterized Projected Bellman Operator

Fruit Picking Robot Arm Training Solution Based on Reinforcement Learning in Digital Twin

Analysis of the possibilities for using machine learning algorithms in the Unity environment

Contact Info

Product

Resources

About