2023
DOI: 10.1080/23080477.2023.2187528
|View full text |Cite
|
Sign up to set email alerts
|

Learning to schedule (L2S): adaptive job shop scheduling using double deep Q network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 44 publications
0
2
0
Order By: Relevance
“…(iii) It can be found that the DRL significantly outperforms all scheduling rules when trained on small-scale instances and generalized on large-scale instances, indicating that the method proposed in this study is effective when dealing with highdimensional input space; and for the whole learning process, DMU is the data used for testing, and it can be seen from the experimental data that the method proposed in this study can effectively learn to generate better for invisible instances solutions. (iv) Tested with the same parameters, the PPO algorithm [44] performs better on instances than DQN [41] and DDPG [58] and performs about the same as the metaheuristic on instances with a relatively small total number of JXMs but for larger instances, the performance of the method proposed in this study is significantly better. However, overall, regardless of the method used, the ability to solve large-scale problems is worse than the ability to solve small-scale problems, and the training error increases as the scale increases in comparison to DRL.…”
Section: Resultsmentioning
confidence: 85%
See 1 more Smart Citation
“…(iii) It can be found that the DRL significantly outperforms all scheduling rules when trained on small-scale instances and generalized on large-scale instances, indicating that the method proposed in this study is effective when dealing with highdimensional input space; and for the whole learning process, DMU is the data used for testing, and it can be seen from the experimental data that the method proposed in this study can effectively learn to generate better for invisible instances solutions. (iv) Tested with the same parameters, the PPO algorithm [44] performs better on instances than DQN [41] and DDPG [58] and performs about the same as the metaheuristic on instances with a relatively small total number of JXMs but for larger instances, the performance of the method proposed in this study is significantly better. However, overall, regardless of the method used, the ability to solve large-scale problems is worse than the ability to solve small-scale problems, and the training error increases as the scale increases in comparison to DRL.…”
Section: Resultsmentioning
confidence: 85%
“…Liu [40] proposed an integrated architecture of DRL and MAS (DRL-MAS) to accomplish real-time scheduling in dynamic environments. Yang [41] developed a DDQN method to solve the scheduling problem of dynamic production lines. Luo [42] used DQN to integrate the dynamic, flexible job shop scheduling problem (FJSP) using DQN to minimize total latency and solve the problem of inserting new orders.…”
Section: Dynamic Job Shop Scheduling Based On Artificial Intelligence...mentioning
confidence: 99%
“…Lei et al (2022) [21] presented an end-to-end deep reinforcement framework to learn a policy that solves the FJSP with the use of a graph neural network in which multi-pointer graph networks (MPGNs) and a muti-PPO training algorithm are developed to learn two sub-policies, i.e., an operation action policy and a machine action policy. Abebaw et al (2023) [22] considered the JSSP as an iterative decisionmaking problem, and a DDQN is utilized for training the model and learning an optimal policy in which six continuous state features are formulated to record the production environment; an epsilon-greedy strategy is used on the action selection; furthermore, the reward and the penalty of the evaluation metric are designed. Zhang et al (2022) [23] used the PPO algorithm in the DRL framework to tackle the dynamic scheduling problem in a job shop manufacturing system with an unexpected event of the machine failure in which the transport agent is required to dispatch jobs/orders to machines then to sinks from machines after the task of jobs is completed.…”
Section: Related Workmentioning
confidence: 99%
“…The training process of the DDQN can be carried out with an exact number of machines and jobs, which is a simplified problem satisfying DRL. Although a simplified trained model does not apply to scenarios with different sizes of machines and jobs, variations such as the processing time, the number of a job's operations, available machines of an operation, and the randomness of job arrivals can be tolerant [22]. Another training way to generalize a model with the two variables included is to generate datasets of all possible production configurations and to use these benchmark examples during each training episode to train a convergent model.…”
Section: Conclusion and Future Research Potentialsmentioning
confidence: 99%