2018
DOI: 10.1016/j.procir.2018.03.212
|View full text |Cite
|
Sign up to set email alerts
|

Optimization of global production scheduling with deep reinforcement learning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
79
0
3

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 281 publications
(82 citation statements)
references
References 12 publications
0
79
0
3
Order By: Relevance
“…Constant reward type was used for training. Constant [36] and variable [10] reward types have been utilised in the earlier RL studies. The maximum level of training steps (4000), and other parameters (Table 4) were fixed based on the initial testing.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Constant reward type was used for training. Constant [36] and variable [10] reward types have been utilised in the earlier RL studies. The maximum level of training steps (4000), and other parameters (Table 4) were fixed based on the initial testing.…”
Section: Discussionmentioning
confidence: 99%
“…RL can be modelled as a Markov Decision Process consisting of states, actions, transitions, and rewards (and its discount factor). RL has been utilised in many real world applications such as learning to play Go (AlphaGo), Atari's 2600 video games, production scheduling [36], and video transcoding [10]. Random Forests have traditionally been successful for solving classification problems (examples in [37]).…”
Section: Related Workmentioning
confidence: 99%
“…DQN proposed by Google DeepMind is a kind of well‐known deep reinforcement learning. In this work, the motivation to utilize DQN architecture is threefold.…”
Section: Problem Formulatementioning
confidence: 99%
“…In terms of system utilization, the RL-agent outperforms FIFO by around 10%. Furthermore, Waschneck et al (2018) propose a combination of supervised learning and deep RL for job-shop scheduling in a semiconductor production [19]. They compare the performance of their approach with an event handler, which operates based on expert knowledge.…”
Section: Related Workmentioning
confidence: 99%