Deep reinforcement learning has been coined as a promising research avenue to solve sequential decisionmaking problems, especially if few is known about the optimal policy structure. We apply the proximal policy optimization algorithm to the intractable joint replenishment problem. We demonstrate how the algorithm approaches the optimal policy structure and outperforms two other heuristics. Its deployment in supply chain control towers can orchestrate and facilitate collaborative shipping in the Physical Internet.
Deep reinforcement learning (DRL) has shown great potential for sequential decision-making, including early developments in inventory control. Yet, the abundance of choices that come with designing a DRL algorithm, combined with the intense computational effort to tune and evaluate each choice, may hamper their application in practice. This paper describes the key design choices of DRL algorithms to facilitate their implementation in inventory control. We also shed light on possible future research avenues that may elevate the current state-of-the-art of DRL applications for inventory control and broaden their scope by leveraging and improving on the structural policy insights within inventory research. Our discussion and roadmap may also spur future research in other domains within operations management.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.