Michel Tokic scite author profile

Michel Tokic

4Publications

267Citation Statements Received

57Citation Statements Given

How they've been cited

482

266

How they cite others

Affiliations

Siemens (Germany), University of Ulm, University of Applied Sciences Ravensburg-Weingarten

Publications

Order By: Most citations

Adaptive ε-Greedy Exploration in Reinforcement Learning Based on Value Differences

Tokic

2010

211

128

View full text Add to dashboard Cite

Abstract. This paper presents "Value-Difference Based Exploration" (VDBE), a method for balancing the exploration/exploitation dilemma inherent to reinforcement learning. The proposed method adapts the exploration parameter of ε-greedy in dependence of the temporal-difference error observed from value-function backups, which is considered as a measure of the agent's uncertainty about the environment. VDBE is evaluated on a multi-armed bandit task, which allows for insight into the behavior of the method. Preliminary results indicate that VDBE seems to be more parameter robust than commonly used ad hoc approaches such as ε-greedy or softmax.

show abstract

Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax

Tokic

Palm

2011

176

View full text Add to dashboard Cite

Abstract. This paper proposes "Value-Difference Based Exploration combined with Softmax action selection" (VDBE-Softmax) as an adaptive exploration/exploitation policy for temporal-difference learning. The advantage of the proposed approach is that exploration actions are only selected in situations when the knowledge about the environment is uncertain, which is indicated by fluctuating values during learning. The method is evaluated in experiments having deterministic rewards and a mixture of both deterministic and stochastic rewards. The results show that a VDBE-Softmax policy can outperform ε-greedy, Softmax and VDBE policies in combination with on-and off-policy learning algorithms such as Q-learning and Sarsa. Furthermore, it is also shown that VDBE-Softmax is more reliable in case of value-function oscillations.

show abstract

A benchmark environment motivated by industrial control problems

Hein

Depeweg

Tokic

et al. 2017

View full text Add to dashboard Cite

Abstract-In the research area of reinforcement learning (RL), frequently novel and promising methods are developed and introduced to the RL community. However, although many researchers are keen to apply their methods on real-world problems, implementing such methods in real industry environments often is a frustrating and tedious process. Generally, academic research groups have only limited access to real industrial data and applications. For this reason, new methods are usually developed, evaluated and compared by using artificial software benchmarks. On one hand, these benchmarks are designed to provide interpretable RL training scenarios and detailed insight into the learning process of the method on hand. On the other hand, they usually do not share much similarity with industrial real-world applications. For this reason we used our industry experience to design a benchmark which bridges the gap between freely available, documented, and motivated artificial benchmarks and properties of real industrial problems. The resulting industrial benchmark (IB) has been made publicly available to the RL community by publishing its Java and Python code, including an OpenAI Gym wrapper, on Github. In this paper we motivate and describe in detail the IB's dynamics and identify prototypic experimental settings that capture common situations in real-world industry control problems.

show abstract

Modeling System Dynamics with Physics-Informed Neural Networks Based on Lagrangian Mechanics

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Michel Tokic

Adaptive ε-Greedy Exploration in Reinforcement Learning Based on Value Differences

Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax

A benchmark environment motivated by industrial control problems

Modeling System Dynamics with Physics-Informed Neural Networks Based on Lagrangian Mechanics

Contact Info

Product

Resources

About