Hybrid intelligence for dynamic job-shop scheduling with deep reinforcement learning and attention mechanism

Zeng, Yunhui; Liao, Zaiyi; Yuanzhi, Dai,; Wang, Rong; Li, Xiu; Yuan, Bo

doi:10.48550/arxiv.2201.00548

Cited by 8 publications

(8 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Where the PPO method, derived from the work of Park [23] et al is one of the methods that achieved SOTA performance. The D3QPN method, derived from the work of Zeng [27] et al is one of the most applicable value-based RL methods for DJSP. Bellman means to replace the average reward calculation in TOFA with Bellman optimal equation.…”

Section: Results On Public Instancesmentioning

confidence: 99%

You Only Train Once: A highly generalizable reinforcement learning method for dynamic job shop scheduling problem

Zeng¹,

Liao²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

<p>Research in artificial intelligence demonstrates the applicability and flexibility of the reinforcement learning (RL) technique for the dynamic job shop scheduling problem (DJSP). However, the RL-based method will always overfit to the training environment and cannot generalize well to novel unseen situations at deployment time, which is unacceptable in real-world production. For this reason, this paper proposes a highly generalizable reinforcement learning framework named Train Once For All (TOFA) for the dynamic job shop scheduling problem. The trivial and non-trivial states are distinguished when the DJSP is formulated as a semi-Markov decision process, defining the size-agnostic state, action, and reward function. A novel graph representation learning method based on attention mechanism and spatial pyramid pooling is implemented to compress the disjunctive graphs of differentsize DJSP into fixed-length feature vectors. Combining the proposed dynamic frame skipping and an improved prioritized experience replay method that considers the sample quality difference at different training phases. TOFA shows superb generalization capability, outperforms practically favored dispatching rules and even instance-by-instance training RL-based schedulers on various benchmark DJSP. Additionally, we proved that TOFA acquires a transferable scheduling policy that can be used to schedule a whole new DJSP without additional training.</p>

show abstract

Section: Results On Public Instancesmentioning

confidence: 99%

You Only Train Once: A highly generalizable reinforcement learning method for dynamic job shop scheduling problem

Zeng¹,

Liao²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…However, the disjunctive graphs only reflect the static features in JSSP, failing to represent the dynamic features in DJSSP. Therefore, several following attributes are added to the vertices in the disjunctive graph to represent the dynamic features [36]:…”

Section: Mathematical Modelmentioning

confidence: 99%

An End-to-End Deep Learning Method for Dynamic Job Shop Scheduling Problem

2022

View full text Add to dashboard Cite

Job shop scheduling problem (JSSP) is essential in the production, which can significantly improve production efficiency. Dynamic events such as machine breakdown and job rework frequently occur in smart manufacturing, making the dynamic job shop scheduling problem (DJSSP) methods urgently needed. Existing rule-based and meta-heuristic methods cannot cope with dynamic events in DJSSPs of different sizes in real time. This paper proposes an end-to-end transformer-based deep learning method named spatial pyramid pooling-based transformer (SPP-Transformer), which shows strong generalizability and can be applied to different-sized DJSSPs. The feature extraction module extracts the production environment features that are further compressed into fixed-length vectors by the feature compression module. Then, the action selection module selects the simple priority rule in real time. The experimental results show that the makespan of SPP-Transformer is 11.67% smaller than the average makespan of dispatching rules, meta-heuristic methods, and RL methods, proving that SPP-Transformer realizes effective dynamic scheduling without training different models for different DJSSPs. To the best of our knowledge, SPP-Transformer is the first application of an end-to-end transformer in DJSSP, which not only improves the productivity of industrial scheduling but also provides a paradigm for future research on deep learning in DJSSP.

show abstract

“…Zhao et al proposed the deep Q-network (DQN) to improve the performance of the adaptive scheduling algorithm in dynamic smart manufacturing [35]. Wang et al [36] and Zeng et al [37] proposed the dual Q-learning (D-Q) method as the solution of the dynamic job-shop scheduling problem. Luo et al [38] proposed a two-hierarchy deep Q-network to deal with flexible job-shop scheduling problems with the disturbance of new jobs.…”

Section: Introductionmentioning

confidence: 99%

Dynamic Scheduling Method for Job-Shop Manufacturing Systems by Deep Reinforcement Learning with Proximal Policy Optimization

Zhang

Lü

et al. 2022

Sustainability

View full text Add to dashboard Cite

With the rapid development of Industrial 4.0, the modern manufacturing system has been experiencing profoundly digital transformation. The development of new technologies helps to improve the efficiency of production and the quality of products. However, for the increasingly complex production systems, operational decision making encounters more challenges in terms of having sustainable manufacturing to satisfy customers and markets’ rapidly changing demands. Nowadays, rule-based heuristic approaches are widely used for scheduling management in production systems, which, however, significantly depends on the expert domain knowledge. In this way, the efficiency of decision making could not be guaranteed nor meet the dynamic scheduling requirement in the job-shop manufacturing environment. In this study, we propose using deep reinforcement learning (DRL) methods to tackle the dynamic scheduling problem in the job-shop manufacturing system with unexpected machine failure. The proximal policy optimization (PPO) algorithm was used in the DRL framework to accelerate the learning process and improve performance. The proposed method was testified within a real-world dynamic production environment, and it performs better compared with the state-of-the-art methods.

show abstract

Hybrid intelligence for dynamic job-shop scheduling with deep reinforcement learning and attention mechanism

Cited by 8 publications

References 43 publications

You Only Train Once: A highly generalizable reinforcement learning method for dynamic job shop scheduling problem

You Only Train Once: A highly generalizable reinforcement learning method for dynamic job shop scheduling problem

An End-to-End Deep Learning Method for Dynamic Job Shop Scheduling Problem

Dynamic Scheduling Method for Job-Shop Manufacturing Systems by Deep Reinforcement Learning with Proximal Policy Optimization

Contact Info

Product

Resources

About