Fully-Pipelined Architecture for Simulated Annealing-based QUBO Solver on the FPGA

Kagawa, Hiroshi; Ito, Yasuaki; Nakano, Koji; Yasudo, Ryota; Kawamata, Yuya; Katsuki, Ryota; Tabata, Yusuke; Yazane, Takashi; Hamano, Kenichiro

doi:10.1109/candar51075.2020.00013

Cited by 6 publications

(4 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Triggered by the rise of quantum annealing [1], the quadratic unconstrained binary optimization (QUBO) has been gathering attention because it is mathematically equivalent to finding the ground state of an Ising model. In particular, QUBO solvers with FPGAs [2]- [4], GPUs [5], [6], and ASICs [7], [8] are developed and reported in the literature as a potential universal solver of a wide range of combinatorial optimization problems. QUBO is an NP-hard problem in which the variables are restricted to 0 and 1, and a solution is represented by a bit vector.…”

Section: Introductionmentioning

confidence: 99%

Bandit-based Variable Fixing for Binary Optimization on GPU Parallel Computing

Yasudo

2023

2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)

Self Cite

View full text Add to dashboard Cite

This paper explores whether reinforcement learning is capable of enhancing metaheuristics for the quadratic unconstrained binary optimization (QUBO), which have recently attracted attention as a solver for a wide range of combinatorial optimization problems. In particular, we introduce a novel approach called the bandit-based variable fixing (BVF). The key idea behind BVF is to regard an execution of an arbitrary metaheuristic with a variable fixed as a play of a slot machine. Thus, BVF explores variables to fix with the maximum expected reward, and executes a metaheuristic at the same time. The bandit-based approach is then extended to fix multiple variables. To accelerate solving multi-armed bandit problem, we implement a parallel algorithm for BVF on a GPU. Our results suggest that our proposed BVF enhances original metaheuristics.

show abstract

Section: Introductionmentioning

confidence: 99%

Bandit-based Variable Fixing for Binary Optimization on GPU Parallel Computing

Yasudo

2023

2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)

Self Cite

View full text Add to dashboard Cite

show abstract

“…Matsubara et al proposed an implementation of the simulated annealing ( SA ) on the FPGA 10 . The FPGA implementation can solve a 1024‐bit QUBO problem with a 20.4G search rate, where the search rate is a metric of the throughput 1,7,10 . It shows the number of solutions searched by a system per second.…”

Section: Introductionmentioning

confidence: 99%

“…10 The FPGA implementation can solve a 1024-bit QUBO problem with a 20.4G search rate, where the search rate is a metric of the throughput. 1,7,10 It shows the number of solutions searched by a system per second. The circuit design has some parts in common with that of our proposed hardware implementation.…”

mentioning

confidence: 99%

See 1 more Smart Citation

High‐throughput FPGA implementation for quadratic unconstrained binary optimization

Kagawa

Ito

Nakano

et al. 2021

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

Quadratic unconstrained binary optimization (QUBO) is a combinatorial optimization problem. Since various NP‐hard problems such as the traveling salesman problem can be formulated as a QUBO instance, QUBO is used with a wide range of applications. The main contribution of this article is to propose high‐throughput FPGA implementations for the QUBO solver. We perform the local search using different bit‐selection strategies based on the simulated annealing in the proposed implementation. The hardware is a pipeline structure with no pipeline hazards using multiple instances, where the bit‐flip operation is always performed every clock cycle. We implemented the proposed circuit on Xilinx UltraScale+ FPGA VU9P. The implementation result shows that the circuit can search 4.30×1011 solutions per second. Besides, by sharing the block RAM that stores a weight matrix, we implemented a dual annealer architecture that has two QUBO solvers into the FPGA. As a result, the dual annealer architecture can search 6.76×1011 solutions per second.

show abstract