Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization

Laterre, Alexandre; Fu, Yunguan; Jabri, Mohamed Khalil; Cohen, Alain-Sam; Kas, David; Hajjar, Karl; Dahl, Torbjørn; Kerkeni, Amine; Beguir, Karim

doi:10.48550/arxiv.1807.01672

Cited by 20 publications

(39 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As for BPP, (Zhao et al, 2021;Young-Dae et al, 2020;Tanaka et al, 2020) focus on online-BPP and learn how to place each given box, (Hu et al, 2017;Duan et al, 2019) learn to generate sequence order for boxes while placing them using a fixed heuristic; therefore these works do not target the full problem of offline-BPP. Ranked reward (RR) (Laterre et al, 2019) learns on the full problem by conducting self-play under the rewards evaluated by ranking. Although RR outperforms MCTS (Browne et al, 2012) and the GUROBI solver (Gurobi Optimization, 2018) especially with large problem sizes, it directly learns on the full combinatorial action space which could pose major challenges for efficient learning.…”

Section: Learning-based Algorithmsmentioning

confidence: 99%

“…We first conduct an ablation study to evaluate the effectiveness of the action space decomposition, as well as the effect of utilizing PO. For this experiment, we use the data setup as in (Laterre et al, 2019), where boxes are generated by cutting a bin of size 10 × 10 × 10, and the resulting box edges could take any integer in the range [1 . .…”

Section: Ablation Study On Action Spacementioning

confidence: 99%

“…10]. (Laterre et al, 2019) conducted experiments by cutting the aforementioned bin into 10, 20, 30, 50 boxes. Here we conduct ablations on the 10 boxes dataset, and a full comparison to this state-of-the-art algorithm (Laterre et al, 2019) will be presented in Sec.3.4.…”

Section: Ablation Study On Action Spacementioning

confidence: 99%

“…(Zhao et al, 2021) conducted experiments with boxes generated by cut and by random sampling. As comparing to their 2 We note here the performance of RR are without MCTS simulations, as these results are presented numerically in (Laterre et al, 2019), while the results with MCTS simulations are only presented in box plots. In the box plots, the RR method with MCTS simulation is compared against plain MCTS (Browne et al, 2012), the Lego heuristic (Hu et al, 2017), and the GUROBI solver (Gurobi Optimization, 2018), where RR shows superior performance against all of these methods.…”

Section: Ablation Study In the Context Of Online Bppmentioning

confidence: 99%

“…Table 1. Performance comparison on the cut dataset with ranked reward (RR) (Laterre et al, 2019). The first two columns show results in rRR, while the last column gives ru of our method to ease comparisons of future works 2 .…”

Section: Ablation Study In the Context Of Online Bppmentioning

confidence: 99%

See 4 more Smart Citations

Attend2Pack: Bin Packing through Deep Reinforcement Learning with Attention

Zhang¹,

Zi²,

Ge³

2021

Preprint

View full text Add to dashboard Cite

This paper seeks to tackle the bin packing problem (BPP) through a learning perspective. Building on self-attention-based encoding and deep reinforcement learning algorithms, we propose a new end-to-end learning model for this task of interest. By decomposing the combinatorial action space, as well as utilizing a new training technique denoted as prioritized oversampling, which is a general scheme to speed up on-policy learning, we achieve state-of-the-art performance in a range of experimental settings. Moreover, although the proposed approach attend2pack targets offline-BPP, we strip our method down to the strict online-BPP setting where it is also able to achieve state-of-the-art performance. With a set of ablation studies as well as comparisons against a range of previous works, we hope to offer as a valid baseline approach to this field of study.

show abstract

Section: Learning-based Algorithmsmentioning

confidence: 99%

Section: Ablation Study On Action Spacementioning

confidence: 99%

Section: Ablation Study On Action Spacementioning

confidence: 99%

Section: Ablation Study In the Context Of Online Bppmentioning

confidence: 99%

Section: Ablation Study In the Context Of Online Bppmentioning

confidence: 99%

See 3 more Smart Citations

Attend2Pack: Bin Packing through Deep Reinforcement Learning with Attention

Zhang¹,

Zi²,

Ge³

2021

Preprint

View full text Add to dashboard Cite

show abstract

Interpretability of rectangle packing solutions with Monte Carlo tree search

Galán López,

González García,

García Díaz

et al. 2024

J Heuristics

View full text Add to dashboard Cite

Packing problems have been studied for a long time and have great applications in real-world scenarios. In recent times, with problems in the industrial world increasing in size, exact algorithms are often not a viable option and faster approaches are needed. We study Monte Carlo tree search, a random sampling algorithm that has gained great importance in literature in the last few years. We propose three approaches based on MCTS and its integration with metaheuristic algorithms or deep learning models to obtain approximated solutions to packing problems that are also interpretable by means of MCTS exploration and from which knowledge can be extracted. We focus on two-dimensional rectangle packing problems in our experimentation and use several well known benchmarks from literature to compare our solutions with existing approaches and offer a view on the potential uses for knowledge extraction from our method. We manage to match the quality of state-of-the-art methods, with improvements in time with respect to some of them and greater interpretability.

show abstract