Recently, more and more solutions have utilised artificial intelligence approaches in order to enhance or optimise processes to achieve greater sustainability. One of the most pressing issues is the emissions caused by cars; in this paper, the problem of optimising the route of delivery cars is tackled. In this paper, the applicability of the deep reinforcement learning algorithms with regards to the aforementioned problem is tested on a simulation game designed and implemented to pose various challenges such as constant change of delivery locations. The algorithms chosen for this task are Advantage Actor-Critic (A2C) with and without Proximal Policy Optimisation (PPO). These novel and advanced reinforcement learning algorithms have yet not been utilised in similar scenarios. The differences in performance and learning process of those are visualised and discussed. It is demonstrated that both of those algorithms present a slow but steady learning curve, which is an expected effect of reinforcement learning algorithms, leading to a conclusion that the algorithms would discover an optimal policy with an adequately long learning process. Additionally, the benefits of the Proximal Policy Optimisation algorithm are proven by the enhanced learning curve with comparison to the Advantage Actor-Critic approach, as the learning process is characterised by faster growth with a significantly smaller variation. Finally, the applicability of such algorithms in the described scenarios is discussed, alongside the possible improvements and future work.