“…Some of them were based on the fundamental principles of Markov Decision Processes (MDP) and their use in Supply Chain Management (Giannoccaro and Pontrandolfo, 2002); others directly followed early RL formulations such as Q-learning for Business Process Management (Huang, van der Aalst, Lu, and Duan, 2011). Numerous publications describe attempts to apply different RL tools for constrained task scheduling and packing problems (Jędrzejowicz and Ratajczak-Ropel, 2013;Mao, Alizadeh, Menache, and Kandula, 2016) and logistics (Yan et al, 2021;Yuan, Li, and Ji, 2021).…”