“…Advancements in DRL approaches in recent years have enabled considerable progress for the domain of COP applications [Cappart et al, 2021, Oren et al, 2021. Some of the major COPs have been successfully solved using DRL such as the Travelling Salesman Problem (TSP) [Zhang et al, 2021, d O Costa et al, 2020, Zhang et al, 2020b, the Knap Sack Problem [Afshar et al, 2020, Cappart et al, 2021 and the Steiner Tree Problem [Du et al, 2021]. Zhang and Dietterich [1995] were able to show the potential of Reinforcement Learning (RL) for JSSPs as far back as 1995, by improving the results of the scheduling algorithm by Deale et al [1994] which used a temporal difference algorithm in combination with simulated annealing.…”