2018
DOI: 10.48550/arxiv.1807.01672
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
39
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 20 publications
(39 citation statements)
references
References 0 publications
0
39
0
Order By: Relevance
“…As for BPP, (Zhao et al, 2021;Young-Dae et al, 2020;Tanaka et al, 2020) focus on online-BPP and learn how to place each given box, (Hu et al, 2017;Duan et al, 2019) learn to generate sequence order for boxes while placing them using a fixed heuristic; therefore these works do not target the full problem of offline-BPP. Ranked reward (RR) (Laterre et al, 2019) learns on the full problem by conducting self-play under the rewards evaluated by ranking. Although RR outperforms MCTS (Browne et al, 2012) and the GUROBI solver (Gurobi Optimization, 2018) especially with large problem sizes, it directly learns on the full combinatorial action space which could pose major challenges for efficient learning.…”
Section: Learning-based Algorithmsmentioning
confidence: 99%
See 4 more Smart Citations
“…As for BPP, (Zhao et al, 2021;Young-Dae et al, 2020;Tanaka et al, 2020) focus on online-BPP and learn how to place each given box, (Hu et al, 2017;Duan et al, 2019) learn to generate sequence order for boxes while placing them using a fixed heuristic; therefore these works do not target the full problem of offline-BPP. Ranked reward (RR) (Laterre et al, 2019) learns on the full problem by conducting self-play under the rewards evaluated by ranking. Although RR outperforms MCTS (Browne et al, 2012) and the GUROBI solver (Gurobi Optimization, 2018) especially with large problem sizes, it directly learns on the full combinatorial action space which could pose major challenges for efficient learning.…”
Section: Learning-based Algorithmsmentioning
confidence: 99%
“…We first conduct an ablation study to evaluate the effectiveness of the action space decomposition, as well as the effect of utilizing PO. For this experiment, we use the data setup as in (Laterre et al, 2019), where boxes are generated by cutting a bin of size 10 × 10 × 10, and the resulting box edges could take any integer in the range [1 . .…”
Section: Ablation Study On Action Spacementioning
confidence: 99%
See 3 more Smart Citations