FPGA Hardware Acceleration of Monte Carlo Simulations for the Ising Model

Ortega-Zamorano, Francisco; Montemurro, Marcelo A.; Cannas, Sergio A.; Jerez, José M.; Franco, Leonardo

doi:10.1109/tpds.2015.2505725

Cited by 22 publications

(19 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The relaxation of the spin system to its ground state, on the other hand, is achieved through feedback control. To speed it up, field-programmable gate arrays (FPGAs) or fieldprogrammable photonic gate arrays (FPPGAs) can be employed in the future [60][61][62] .…”

Section: Resultsmentioning

confidence: 99%

Large-scale Ising emulation with four body interaction and all-to-all connections

2020

View full text Add to dashboard Cite

Optical Ising machines with two-body interactions have shown potential in solving combinatorial optimization problems which are extremely hard to solve with digital computers. Yet, some physical systems cannot be properly described by only two-body interactions. Here, we propose and demonstrate a nonlinear optics approach to emulate Ising machines containing many spins (up to a million in the absence of optical imperfections) and with tailored all-to-all two and four-body interactions. Our approach employs a spatial light modulator to encode and control the spins in the form of the binary-phase values, and emulates the high-order interaction with frequency conversion in a nonlinear crystal. By implementing adaptive feedback, the system can be evolved into effective spin configurations that well-approximate the ground-states of Ising Hamiltonians with all-to-all connected many-body interactions. Our technique could serve as a tool to probe complex, many-body physics and give rise to exciting applications in big-data optimization, computing, and analytics.

show abstract

Section: Resultsmentioning

confidence: 99%

Large-scale Ising emulation with four body interaction and all-to-all connections

2020

View full text Add to dashboard Cite

show abstract

“…To quantify the performance of our implementation, we run our algorithm on TPU v3 using single core and multiple cores on TPU v3 clusters. As in [23,3,20], we measure the time spent on one sweep update, i.e., one update on all "black" spins plus one update on all "white" ones, and compute the average number of flips per nanosecond by dividing with n 2 .…”

Section: Benchmarksmentioning

confidence: 99%

“…Other older benchmarks published in [23,3,20] are also listed for reference. lattice size n 2 (flips/ns) (nJ/flip) (20 × 128) 2 8.1920 12.2070 (40 × 128) 2 9.3623 10.6811 (80 × 128) 2 12.3362 8.1062 (160 × 128) 2 12.8266 7.7963 (320 × 128) 2 12.9056 7.7486 (640 × 128) 2 12.8783 7.7650 GPU in [23,3] 7.9774 -Nvidia Tesla V100 11.3704 21.9869 FPGA in [20] 614.4 - Table 1: The computation throughput (flips/ns) and the estimated energy consumption upper bound (nJ/flip) with different sizes of the square lattice on a single TPU v3 core (half TPU v3 chip). Not comparing to FPGA, a single TensorCore sustains more flips/ns at all but the two smallest lattice sizes and consistently shows better energy efficiency.…”

Section: Single Tpu Corementioning

confidence: 99%

High performance Monte Carlo simulation of ising model on TPU clusters

Yang

Chen

Roumpos

et al. 2019

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

View full text Add to dashboard Cite

Large-scale deep learning benefits from an emerging class of AI accelerators. Some of these accelerators' designs are general enough for compute-intensive applications beyond AI and Cloud TPU is one such example. In this paper, we demonstrate a novel approach using TensorFlow on Cloud TPU to simulate the two-dimensional Ising Model. TensorFlow and Cloud TPU framework enable the simple and readable code to express the complicated distributed algorithm without compromising the performance. Our code implementation fits into a small Jupyter Notebook and fully utilizes Cloud TPU's efficient matrix operation and dedicated high speed inter-chip connection. The performance is highly competitive: it outperforms the best published benchmarks to our knowledge by 60% in single-core and 250% in multi-core with good linear scaling. When compared to Tesla V100 GPU, the singlecore performance maintains a ∼10% gain. We also demonstrate that using low precision arithmetic-bfloat16-does not compromise the correctness of the simulation results.or heterogeneous nodes commonly seen in private or commercial clouds. Benefiting from the explosion of machine learning, especially deep learning, commercial clouds provide not only CPUs and GPUs, but also specialized chips such as FPGAs and other in-house processors. The Tensor Processing Unit ("Cloud TPU" or "TPU" for short)-an AI application-specific integrated circuit (ASIC) developed by Google for neural network machine learning-has received much attention in the machine learning community [18,17]. Its latest release, Cloud TPU v3, offers 420 × 10 12 floating-point operations per second (FLOPS) and 128GB of high bandwidth memory (HBM) 1 . Multiple units are connected to form a "POD" (Cloud TPU v3 Pod) through a dedicated high speed 2-D toroidal mesh network, allowing up to 100+ peta-FLOPS and 32TB of HBM 1 to be accessed by the application with very low latency and in lockstep. TPU is programmable via software frontends such as TensorFlow [1] or PyTorch [21], and can be deployed both for training huge deep neural networks and for performing low-latency online prediction. [14] reports impressive acceleration of training and online prediction.With the tremendous amount of computation resources that TPU offers, it is compelling to also consider the opportunities TPU brings for applications beyond machine learning. The programming frontends that are used for TPU, such as TensorFlow, also offer a rich set of functionalities that are highly relevant for scientific computations. The TensorFlow TPU programming stack also provides the additional benefits of allowing distributed algorithms to be expressed with simple and easy-to-understand code without sacrificing performance. In addition, the ability to program conventional scientific simulations in TensorFlow framework makes it easier to explore the hybrid approaches employing both conventional scientific computation methods and modern machine learning techniques on the same framework.Motivated by these observations, we developed a Single Instruc...

show abstract

“…Their implementation achieved increased speedup and much lower power delay compared to the conventional software method based on a CPU. Ortega-Zamorano et al [7] proposed a FPGA hardware acceleration method of Monte Carlo simulations for the Ising model, which is a paradigm of the statistical physics approach to the study of finite temperature equilibrium properties of many body systems. Their method also achieved remarkable speedup in comparison to a standard CPU-based method.…”

Section: Introductionmentioning

confidence: 99%

Real-Time Monte Carlo Optimization on FPGA for the Efficient and Reliable Message Chain Structure

Lee

Kim

2019

Electronics

View full text Add to dashboard Cite

This paper addresses the real-time optimization problem to find the most efficient and reliable message chain structure in data communications based on half-duplex command-response protocols such as MIL-STD-1553B communication systems. This paper proposes a real-time Monte Carlo optimization method implemented on field programmable gate arrays (FPGA) which can not only be conducted very quickly but also avoid the conflicts with other tasks on a central processing unit (CPU). Evaluation results showed that the proposed method can consistently find the optimal message chain structure within a quite small and deterministic time, which was much faster than the conventional Monte Carlo optimization method on a CPU.Abstract: This paper addresses the real-time optimization problem to find the most efficient and reliable message chain structure in data communications based on half-duplex command-response protocols such as MIL-STD-1553B communication systems. This paper proposes a real-time Monte Carlo optimization method implemented on field programmable gate arrays (FPGA) which can not only be conducted very quickly but also avoid the conflicts with other tasks on a central processing unit (CPU). Evaluation results showed that the proposed method can consistently find the optimal message chain structure within a quite small and deterministic time, which was much faster than the conventional Monte Carlo optimization method on a CPU.

show abstract

FPGA Hardware Acceleration of Monte Carlo Simulations for the Ising Model

Cited by 22 publications

References 40 publications

Large-scale Ising emulation with four body interaction and all-to-all connections

Large-scale Ising emulation with four body interaction and all-to-all connections

High performance Monte Carlo simulation of ising model on TPU clusters

Real-Time Monte Carlo Optimization on FPGA for the Efficient and Reliable Message Chain Structure

Contact Info

Product

Resources

About