Deep reinforcement learning has been coined as a promising research avenue to solve sequential decisionmaking problems, especially if few is known about the optimal policy structure. We apply the proximal policy optimization algorithm to the intractable joint replenishment problem. We demonstrate how the algorithm approaches the optimal policy structure and outperforms two other heuristics. Its deployment in supply chain control towers can orchestrate and facilitate collaborative shipping in the Physical Internet.
Problem definition: Is deep reinforcement learning (DRL) effective at solving inventory problems? Academic/practical relevance: Given that DRL has successfully been applied in computer games and robotics, supply chain researchers and companies are interested in its potential in inventory management. We provide a rigorous performance evaluation of DRL in three classic and intractable inventory problems: lost sales, dual sourcing, and multi-echelon inventory management. Methodology: We model each inventory problem as a Markov decision process and apply and tune the Asynchronous Advantage Actor-Critic (A3C) DRL algorithm for a variety of parameter settings. Results: We demonstrate that the A3C algorithm can match the performance of the state-of-the-art heuristics and other approximate dynamic programming methods. Although the initial tuning was computationally demanding and time demanding, only small changes to the tuning parameters were needed for the other studied problems. Managerial implications: Our study provides evidence that DRL can effectively solve stationary inventory problems. This is especially promising when problem-dependent heuristics are lacking. Yet, generating structural policy insight or designing specialized policies that are (ideally provably) near optimal remains desirable.
Synchromodality, also referred to as "synchronized intermodality", employs multiple transport modes in a flexible, dynamic way in order to induce a modal shift towards more environmentally friendly transport modes like rail or inland waterways, without compromising on responsiveness and quality of service. It is characterized by the synchronized parallel usage of different transport modes and/or the ability to switch freely between transport modes at particular times while a consignment is in transit. We present a decision rule that can integrate both the parallel usage, as well as real-time switching of transport modes, either in combination or separately. It takes into account real-time stock levels and service requirements of the shipper. The policy first determines at the source which volumes will be shipped using each mode of transport, and subsequently depicts whether it should switch modes at an intermediate terminal. Using a simulation study we demonstrate how our synchromodal transport policy can induce a modal shift towards low carbon transport modes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.