A Comparative Study of Periodic-Review Order-Up-To (T, S) Policy and Continuous-Review (s, S) Policy in a Serial Supply Chain Over a Finite Planning Horizon

Sethupathi, P.V. Rajendra; Rajendran, Chandrasekharan; Ziegler, Holger

doi:10.1007/978-1-4471-5352-8_6

Cited by 3 publications

(1 citation statement)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is often done with rule-based algorithms. These rules are derived from past demand patterns [11]. However, due to their low flexibility, these methods are not suitable for strongly varying demand patterns.…”

Section: Related Workmentioning

confidence: 99%

A deep q-learning-based optimization of the inventory control in a linear process chain

Dittrich¹,

Fohlmeister²

2020

Prod. Eng. Res. Devel.

View full text Add to dashboard Cite

Due to growing globalized markets and the resulting globalization of production networks across different companies, inventory and order optimization is becoming increasingly important in the context of process chains. Thus, an adaptive and continuously self-optimizing inventory control on a global level is necessary to overcome the resulting challenges. Advances in sensor and communication technology allow companies to realize a global data exchange to achieve a holistic inventory control. Based on deep q-learning, a method for a self-optimizing inventory control is developed. Here, the decision process is based on an artificial neural network. Its input is modeled as a state vector that describes the current stocks and orders within the process chain. The output represents a control vector that controls orders for each individual station. Furthermore, a reward function, which is based on the resulting storage and late order costs, is implemented for simulations-based decision optimization. One of the main challenges of implementing deep q-learning is the hyperparameter optimization for the training process, which is investigated in this paper. The results show a significant sensitivity for the leaning rate α and the exploration rate ε. Based on optimized hyperparameters, the potential of the developed methodology could be shown by significantly reducing the total costs compared to the initial state and by achieving stable control behavior for a process chain containing up to 10 stations.

show abstract

Section: Related Workmentioning

confidence: 99%