2020
DOI: 10.1016/j.compind.2020.103239
|View full text |Cite
|
Sign up to set email alerts
|

Use of Proximal Policy Optimization for the Joint Replenishment Problem

Abstract: Deep reinforcement learning has been coined as a promising research avenue to solve sequential decisionmaking problems, especially if few is known about the optimal policy structure. We apply the proximal policy optimization algorithm to the intractable joint replenishment problem. We demonstrate how the algorithm approaches the optimal policy structure and outperforms two other heuristics. Its deployment in supply chain control towers can orchestrate and facilitate collaborative shipping in the Physical Inter… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
36
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 79 publications
(43 citation statements)
references
References 35 publications
0
36
0
Order By: Relevance
“…[22], for example, use a single agent with complete information to dispatch production in a cloud manufacturing environment. [23] propose a way to solve the joint distribution problem: which items to select for which types of deliveries. In the logistics area, [24] show how to assign vehicles to platoons of trucks and where to send the platoons.…”
Section: Reinforcement Learning For Supply Chain Decision-makingmentioning
confidence: 99%
“…[22], for example, use a single agent with complete information to dispatch production in a cloud manufacturing environment. [23] propose a way to solve the joint distribution problem: which items to select for which types of deliveries. In the logistics area, [24] show how to assign vehicles to platoons of trucks and where to send the platoons.…”
Section: Reinforcement Learning For Supply Chain Decision-makingmentioning
confidence: 99%
“…More empirical approaches were proposed following the aforementioned works, such as clipping the distance between the old and new policy in the proximal policy optimization (PPO) algorithm of Schulman, Wolski, Dhariwal, Radford, & Klimov (2017) . The PPO algorithm was successfully applied in the joint replenishment inventory problem by Vanvuchelen, Gijsbrechts, & Boute (2020) . The computational requirements could be further improved.…”
Section: Actor-critics and Other Hybrid Techniquesmentioning
confidence: 99%
“…Understand and improve performance of heuristic policies: The OR/OM community has a strong focus on knowledge building and benchmarking these new approaches against existing policies. Accordingly, most early studies that apply DRL to inventory management benchmark DRL performance against the state-of-theart problem-specific heuristic policies (see e.g., Gijsbrechts et al, 2019;Vanvuchelen et al, 2020 ). As DRL matures, we may reverse this process and use DRL to benchmark the performance of existing (or new) heuristic policies, and understand why-or whenthese heuristics perform better or worse.…”
Section: Blending Numerical and Analytical Approaches To Optimize Inventory Policiesmentioning
confidence: 99%
See 1 more Smart Citation
“…A variant of PPO algorithm called memory proximal policy optimization is presented to solve quantum control tasks [ 23 ]. In [ 24 ], a PPO-based machine learning algorithm is implemented to decide on the replenishments of a group of collaborating companies. However, to our best knowledge, PPO has never been resorted for AOR task.…”
Section: Related Workmentioning
confidence: 99%