Reinforcement Learning for Combinatorial Optimization: A Survey

Mazyavkina, Nina; Свиридов, С. И.; Ivanov, S. V.; Burnaev, Evgeny

doi:10.48550/arxiv.2003.03600

Cited by 20 publications

(18 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Related to our approach, Cappart et al (2020) propose to combine reinforcement learning, constraint programming and dynamic programming and experiment with the TSP with time windows. For surveys of machine learning for routing problems and combinatorial optimization in general, we refer to Mazyavkina et al (2020); Vesselinova et al (2020).…”

Section: Machine Learning For Vehicle Routing Problemsmentioning

confidence: 99%

Deep Policy Dynamic Programming for Vehicle Routing Problems

Kool¹,

Hoof²,

Gromicho³

et al. 2021

Preprint

View full text Add to dashboard Cite

Routing problems are a class of combinatorial problems with many practical applications. Recently, end-to-end deep learning methods have been proposed to learn approximate solution heuristics for such problems. In contrast, classical dynamic programming (DP) algorithms can find optimal solutions, but scale badly with the problem size. We propose Deep Policy Dynamic Programming (DPDP), which aims to combine the strengths of learned neural heuristics with those of DP algorithms. DPDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions. We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms, making them competitive to strong alternatives such as LKH, while also outperforming other 'neural approaches' for solving TSPs and VRPs with 100 nodes.

show abstract

Section: Machine Learning For Vehicle Routing Problemsmentioning

confidence: 99%

Deep Policy Dynamic Programming for Vehicle Routing Problems

Kool¹,

Hoof²,

Gromicho³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Solving single-agent routing (scheduling) problems with RL. According to [26], the RL approaches to solving agent routing problems can be categorized into: (1) improvement heuristics learns to rewrite the complete solution iteratively to obtain a better solution [43,5,4,24]; (2) construction approach learns to construct a solution by sequentially assigning idle agents to unvisited cities until the full routing schedule (sequence) is constructed [3,28,20,19], and (3) hybrid approaches blending both approaches [17,7,21,1]. Typically, learning-based improvement or hybrid approaches have shown good performance since these can iteratively update the best solution until reaching the best one.…”

Section: Related Workmentioning

confidence: 99%

ScheduleNet: Learn to solve multi-agent scheduling problems with reinforcement learning

Park,

Bakhtiyar,

Park

2021

Preprint

View full text Add to dashboard Cite

We propose ScheduleNet, a RL-based real-time scheduler, that can solve various types of multi-agent scheduling problems. We formulate these problems as a semi-MDP with episodic reward (makespan) and learn ScheduleNet, a decentralized decision-making policy that can effectively coordinate multiple agents to complete tasks. The decision making procedure of ScheduleNet includes: (1) representing the state of a scheduling problem with the agent-task graph, (2) extracting node embeddings for agent and tasks nodes, the important relational information among agents and tasks, by employing the type-aware graph attention (TGA), and (3) computing the assignment probability with the computed node embeddings. We validate the effectiveness of ScheduleNet as a general learning-based scheduler for solving various types of multi-agent scheduling tasks, including multiple salesman traveling problem (mTSP) and job shop scheduling problem (JSP).Preprint. Under review.

show abstract

“…For example, Kong et al (2019) applies RL to knapsack and secretary problems, and Khalil et al (2017) uses RL to solve graph problems. Mazyavkina et al (2020) and Bengio et al (2020) provide an extensive survey on applications of ML and RL in combinatorial optimization. For the specific problem of shipping optimization, van Andel (2018) uses ML to consolidate shipments from nearby suppliers.…”

Section: Related Workmentioning

confidence: 99%

Learning Algorithms for Regenerative Stopping Problems with Applications to Shipping Consolidation in Logistics

Jothimurugan¹,

Andrews²,

Lee³

et al. 2021

Preprint

View full text Add to dashboard Cite

We study regenerative stopping problems in which the system starts anew whenever the controller decides to stop and the long-term average cost is to be minimized. Traditional modelbased solutions involve estimating the underlying process from data and computing strategies for the estimated model. In this paper, we compare such solutions to deep reinforcement learning and imitation learning which involve learning a neural network policy from simulations. We evaluate the different approaches on a real-world problem of shipping consolidation in logistics and demonstrate that deep learning can be effectively used to solve such problems.

show abstract

Reinforcement Learning for Combinatorial Optimization: A Survey

Cited by 20 publications

References 4 publications

Deep Policy Dynamic Programming for Vehicle Routing Problems

Deep Policy Dynamic Programming for Vehicle Routing Problems

ScheduleNet: Learn to solve multi-agent scheduling problems with reinforcement learning

Learning Algorithms for Regenerative Stopping Problems with Applications to Shipping Consolidation in Logistics

Contact Info

Product

Resources

About