Towards Standardizing Reinforcement Learning Approaches for Stochastic Production Scheduling

Rinciog, Alexandru; Meyer, Anne S.

doi:10.48550/arxiv.2104.08196

Cited by 4 publications

(5 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The reward is most often proportional to the optimization target or a value that highly correlates with it (e.g. makespan and average utilization) [24]. In our study, the reward has a relation to the machine loads (𝑖𝑛𝑠𝑡𝐿𝑜𝑎𝑑 𝑗 ) and the completion time of the job.…”

Section: Reward Functionmentioning

confidence: 86%

“…(1 − 𝛼) is the probability of keeping the old Q-value and 𝛾 is a discount factor, it is used to balance the immediate and the future reward. (max(𝑄(𝑠 t+1 , 𝑎))) represents the maximum Q-value of the next state, that is the reason why the Q-learning is an offpolicy algorithm, the policy used during the evaluation stage can differ from the one used in the improvement stage, which leads to more exploration at the expense of convergence speed [24]. In other words, it is not necessarily that the action 𝑎 t chosen in the state 𝑠 t is the same one as 𝑎 in the target (𝑟 t + 𝛾(max(𝑄(𝑠 t+1 , 𝑎)))).…”

Section: Reinforcement Learningmentioning

confidence: 99%

See 1 more Smart Citation

Solving a Job Shop Scheduling Problem Using Q-Learning Algorithm

Belmamoune

Ghomri

Yahouni

2023

Studies in Computational Intelligence

View full text Add to dashboard Cite

Job Shop Scheduling Problem (JSSP) is among the combinatorial optimization and Non-Deterministic Polynomial-time (NP) problems. Researchers have contributed in this area using several methods, among the methods we have machine learning algorithms, more precisely Reinforcement Learning (RL). The reason why the scientists resort to RL is the adequacy of the algorithm for this type of problem. The results of the RL approach tend toward optimal or nearoptimal solutions. In this paper, we deal with the JSSP, using the RL algorithm, more specifically a Q-learning algorithm. We propose a new representation of the state of the environment. We introduce two evaluations of the agent using two different methods. The actions selected by the agent are the dispatching rules. Finally, we compared the results obtained by the approach with the literature.

show abstract

Section: Reward Functionmentioning

confidence: 86%

Section: Reinforcement Learningmentioning

confidence: 99%

Solving a Job Shop Scheduling Problem Using Q-Learning Algorithm

Belmamoune

Ghomri

Yahouni

2023

Studies in Computational Intelligence

View full text Add to dashboard Cite

show abstract

“…The main component in RL is called agent, which takes actions to obtain maximum cumulative reward through interactions with the environment like a human being. Since the fast computation and ability to cope with dynamic events of RL [15], RL has achieved outstanding success in solving DJSSP. Q-learning is a classic algorithm of RL which chooses the action with the highest Q-value stored in the Q table.…”

Section: Introductionmentioning

confidence: 99%

An End-to-End Deep Learning Method for Dynamic Job Shop Scheduling Problem

2022

View full text Add to dashboard Cite

Job shop scheduling problem (JSSP) is essential in the production, which can significantly improve production efficiency. Dynamic events such as machine breakdown and job rework frequently occur in smart manufacturing, making the dynamic job shop scheduling problem (DJSSP) methods urgently needed. Existing rule-based and meta-heuristic methods cannot cope with dynamic events in DJSSPs of different sizes in real time. This paper proposes an end-to-end transformer-based deep learning method named spatial pyramid pooling-based transformer (SPP-Transformer), which shows strong generalizability and can be applied to different-sized DJSSPs. The feature extraction module extracts the production environment features that are further compressed into fixed-length vectors by the feature compression module. Then, the action selection module selects the simple priority rule in real time. The experimental results show that the makespan of SPP-Transformer is 11.67% smaller than the average makespan of dispatching rules, meta-heuristic methods, and RL methods, proving that SPP-Transformer realizes effective dynamic scheduling without training different models for different DJSSPs. To the best of our knowledge, SPP-Transformer is the first application of an end-to-end transformer in DJSSP, which not only improves the productivity of industrial scheduling but also provides a paradigm for future research on deep learning in DJSSP.

show abstract

“…Dynamic denotes that shopfloor environment has its complexity involving multiple work centres, multiple machines of different characteristics and multiple products, coupled with technological and logistic constraints [2]. Key performance indicators (KPIs) for such production system are not only job-oriented, but also need to be economic and sustainable in terms of resource utilization [3]. It is generally agreeable that Reinforcement Learning (RL) is now the leading approach to offer dynamic dispatching solutions compared to rule-based heuristic, thanks to the advancement of sensorisation and connectivity technology in place to support data extraction from real production, to facilitate RL agent training and future estimation [1].…”

Section: Introductionmentioning

confidence: 99%

A Case Study: Dynamic Dispatching Framework for Multi-Product Production System via Reinforcement Learning Approach

Zhuang¹,

Tnay²

2022

Asian Society for Precision Engineering and Nanotechnology (ASPEN 2022)

View full text Add to dashboard Cite

Contemporary manufacturing systems often exhibit characteristics such as multi-product, multi-objective and increasing types and frequencies of uncertainties during production execution. It is non-trivial to consider a novel dispatching strategy adaptable to the ever-changing environment while maintain satisfactory production performance. In this work, we propose an end-to-end framework which consists of: 1) a simulated multi-product production system that can be configured with comprehensive constraints and uncertainties (the Environment), 2) a dispatcher that makes progressive decisions based on the tracked status of the Environment (the Agent), and 3) a messenger that transmits tracked status, dispatcher's decisions and feedback to decisions in-between the Environment and the Agent. The three-component dynamic dispatching framework, enabled by Reinforcement Learning (RL) approach, is among the first to be demonstrated on such a sophisticated dispatching problem, extracted from a real precision engineering production system. We wish the work would be valuable to advance further development and implementation of RL systems for production control applications.

show abstract

Towards Standardizing Reinforcement Learning Approaches for Stochastic Production Scheduling

Cited by 4 publications

References 33 publications

Solving a Job Shop Scheduling Problem Using Q-Learning Algorithm

Solving a Job Shop Scheduling Problem Using Q-Learning Algorithm

An End-to-End Deep Learning Method for Dynamic Job Shop Scheduling Problem

A Case Study: Dynamic Dispatching Framework for Multi-Product Production System via Reinforcement Learning Approach

Contact Info

Product

Resources

About