High performance Monte Carlo simulation of ising model on TPU clusters

Yang, Kun; Chen, Yifan; Roumpos, Georgios; Colby, Chris; Anderson, John R.

doi:10.1145/3295500.3356149

Cited by 37 publications

(39 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The ability to match DNS with a

10

× coarser grid makes the learned interpolation solver much faster. We benchmark our solver on a single core of Google’s Cloud TPU v4, a hardware accelerator designed for accelerating ML models that is also suitable for many scientific computing use cases ( 45 – 47 ). The TPU is designed for high-throughput vectorized operations, with extremely high throughput matrix–matrix multiplication in low precision (bfloat16).…”

Section: Resultsmentioning

confidence: 99%

Machine learning–accelerated computational fluid dynamics

Kochkov

Smith

Alieva

et al. 2021

Proc. Natl. Acad. Sci. U.S.A.

603

254

View full text Add to dashboard Cite

Numerical simulation of fluids plays an essential role in modeling many physical phenomena, such as weather, climate, aerodynamics, and plasma physics. Fluids are well described by the Navier–Stokes equations, but solving these equations at scale remains daunting, limited by the computational cost of resolving the smallest spatiotemporal features. This leads to unfavorable trade-offs between accuracy and tractability. Here we use end-to-end deep learning to improve approximations inside computational fluid dynamics for modeling two-dimensional turbulent flows. For both direct numerical simulation of turbulence and large-eddy simulation, our results are as accurate as baseline solvers with 8 to 10× finer resolution in each spatial dimension, resulting in 40- to 80-fold computational speedups. Our method remains stable during long simulations and generalizes to forcing functions and Reynolds numbers outside of the flows where it is trained, in contrast to black-box machine-learning approaches. Our approach exemplifies how scientific computing can leverage machine learning and hardware accelerators to improve simulations without sacrificing accuracy or generalization.

show abstract

“…The ability to match DNS with a

10

Section: Resultsmentioning

confidence: 99%

Machine learning–accelerated computational fluid dynamics

Kochkov

Smith

Alieva

et al. 2021

Proc. Natl. Acad. Sci. U.S.A.

603

254

View full text Add to dashboard Cite

show abstract

“…As an example for recent work, where the two dimensional Ising model has been simulated on a GPU, see ref. [36].…”

Section: Discussionmentioning

confidence: 99%

Dynamic critical exponent z of the three-dimensional Ising universality class: Monte Carlo simulations of the improved Blume-Capel model

Hasenbusch

2020

Phys. Rev. E

View full text Add to dashboard Cite

We study purely dissipative relaxational dynamics in the three-dimensional Ising universality class. To this end, we simulate the improved Blume-Capel model on the simple cubic lattice by using local algorithms. We perform a finite size scaling analysis of the integrated autocorrelation time of the magnetic susceptibility in equilibrium at the critical point. As a complement we perform non-equilibrium simulations. Completely ordered configurations are suddenly quenched to the critical temperature. As our final result for the dynamic critical exponent we obtain z = 2.024(2). * M.Hasenbusch@thphys.uni-heidelberg.de

show abstract

“…It is plausible with the following four reasons: (1) TPU is an ML application-specific integrated circuit (ASIC), devised for neural networks (NNs); NNs require massive amounts of multiplications and additions between the data and parameters and TPU can handle these computations in terms of matrix multiplications in a very efficient manner [29]; similarly, DFT can also be formulated as matrix multiplications between the input data and the Vandemonde matrix; (2) TPU chips are connected directly to each other with dedicated, high-speed, and lowlatency interconnects, bypassing host CPU or any networking resources; therefore, the large-scale DFT computation can be distributed among multiple TPUs with minimal communication time and hence very high parallel efficiency; (3) the large capacity of the in-package memory of TPU makes it possible to handle large-scale DFT efficiently; and (4) TPU is programmable with software front ends such as TensorFlow [30] and PyTorch [31], both of which make it straightforward to implement the parallel algorithms of DFT on TPUs. In fact, all the aforementioned four reasons have been verified in the high-performance Monte Carlo simulations on TPUs [32], [33].…”

Section: Introductionmentioning

confidence: 64%

Large-Scale Discrete Fourier Transform on TPUs

Chen

Hechtman

et al. 2021

IEEE Access

Self Cite

View full text Add to dashboard Cite

In this work, we present two parallel algorithms for the large-scale discrete Fourier transform (DFT) on Tensor Processing Unit (TPU) clusters. The two parallel algorithms are associated with two DFT formulations: one formulation, denoted as KDFT, is based on the Kronecker product; the other is based on the famous Cooley-Tukey algorithm and phase adjustment, denoted as FFT. Both KDFT and FFT formulations take full advantage of TPU's strength in matrix multiplications. The KDFT formulation allows direct use of nonuniform inputs without additional step. In the two parallel algorithms, the same strategy of data decomposition is applied to the input data. Through the data decomposition, the dense matrix multiplications in KDFT and FFT are kept local within TPU cores, which can be performed completely in parallel. The communication among TPU cores is achieved through the one-shuffle scheme in both parallel algorithms, with which sending and receiving data takes place simultaneously between two neighboring cores and along the same direction on the interconnect network. The one-shuffle scheme is designed for the interconnect topology of TPU clusters, minimizing the time required by the communication among TPU cores. Both KDFT and FFT are implemented in TensorFlow. The three-dimensional complex DFT is performed on an example of dimension 8192 × 8192 × 8192 with a full TPU Pod: the run time of KDFT is 12.66 seconds and that of FFT is 8.3 seconds. Scaling analysis is provided to demonstrate the high parallel efficiency of the two DFT implementations on TPUs.

show abstract

High performance Monte Carlo simulation of ising model on TPU clusters

Cited by 37 publications

References 16 publications

Machine learning–accelerated computational fluid dynamics

Machine learning–accelerated computational fluid dynamics

Dynamic critical exponent z of the three-dimensional Ising universality class: Monte Carlo simulations of the improved Blume-Capel model

Large-Scale Discrete Fourier Transform on TPUs

Contact Info

Product

Resources

About