2019 15th International Conference on Electronics, Computer and Computation (ICECCO) 2019
DOI: 10.1109/icecco48375.2019.9043194
|View full text |Cite
|
Sign up to set email alerts
|

Training Unity Machine Learning Agents using reinforcement learning method

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(5 citation statements)
references
References 3 publications
0
4
0
1
Order By: Relevance
“…To handle the increased complexity of the input space of the considered problems, we leverage neural network regression to model our PBO. We consider an offline setting, where we use ProFQI on car-on-hill (Ernst, Geurts, and Wehenkel 2005), and an online setting, where we use ProDQN on bicycle balancing (Randlov and Alstrøm 1998), and lunar lander (Brockman et al 2016). We want to answer the following research question: does PBO enable moving toward the fixed point more effectively than the empirical Bellman operator?…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…To handle the increased complexity of the input space of the considered problems, we leverage neural network regression to model our PBO. We consider an offline setting, where we use ProFQI on car-on-hill (Ernst, Geurts, and Wehenkel 2005), and an online setting, where we use ProDQN on bicycle balancing (Randlov and Alstrøm 1998), and lunar lander (Brockman et al 2016). We want to answer the following research question: does PBO enable moving toward the fixed point more effectively than the empirical Bellman operator?…”
Section: Methodsmentioning
confidence: 99%
“…We also evaluate PBO in an online setting, using our ProDQN algorithm and comparing against DQN (Mnih et al 2015). We consider a bicycle balancing (Randlov and Alstrøm 1998) problem and the lunar lander environment (Brockman et al 2016). We set the number of Bellman iterations K = 8 for bicycle balancing and K = 10 for the lunar lander.…”
Section: Projected Deep Q-networkmentioning
confidence: 99%
“…The robotic arm's reinforcement learning training is put into practice using Unity ML-Agents. In the Unity reinforcement learning tool, the action performer is referred to as an agent, who is merged into the environment; the strategy is the objective of the action execution; the brain is in control of providing associated agents decision-making strategies to guide the execution of the action [22]; Before and after the action is taken, there will be two states, and the difference between the 2 states will produce a reward value that meets the conditions of the strategy, information exchange and general instruction. Figure 3 depicts the link between "Agent," "Brain," and "Academy": Our digital twin training environment is built based on Anaconda Navigator, the officially recommended plugin for deep learning training on objects.…”
Section: Training Environment For Reinforcement Learningmentioning
confidence: 99%
“…Nie podjęto jednak próby analizy tych rezultatów. Unity wykorzystywano również do treningu agentów w innych, indywidualnych projektach środowisk [8]. Jednak nie zostały one szczegółowo opisane, ani nie wskazano zastosowanych algorytmów uczenia.…”
Section: Wstępunclassified