Large-scale deep learning benefits from an emerging class of AI accelerators. Some of these accelerators' designs are general enough for compute-intensive applications beyond AI and Cloud TPU is one such example. In this paper, we demonstrate a novel approach using TensorFlow on Cloud TPU to simulate the two-dimensional Ising Model. TensorFlow and Cloud TPU framework enable the simple and readable code to express the complicated distributed algorithm without compromising the performance. Our code implementation fits into a small Jupyter Notebook and fully utilizes Cloud TPU's efficient matrix operation and dedicated high speed inter-chip connection. The performance is highly competitive: it outperforms the best published benchmarks to our knowledge by 60% in single-core and 250% in multi-core with good linear scaling. When compared to Tesla V100 GPU, the singlecore performance maintains a ∼10% gain. We also demonstrate that using low precision arithmetic-bfloat16-does not compromise the correctness of the simulation results.or heterogeneous nodes commonly seen in private or commercial clouds. Benefiting from the explosion of machine learning, especially deep learning, commercial clouds provide not only CPUs and GPUs, but also specialized chips such as FPGAs and other in-house processors. The Tensor Processing Unit ("Cloud TPU" or "TPU" for short)-an AI application-specific integrated circuit (ASIC) developed by Google for neural network machine learning-has received much attention in the machine learning community [18,17]. Its latest release, Cloud TPU v3, offers 420 × 10 12 floating-point operations per second (FLOPS) and 128GB of high bandwidth memory (HBM) 1 . Multiple units are connected to form a "POD" (Cloud TPU v3 Pod) through a dedicated high speed 2-D toroidal mesh network, allowing up to 100+ peta-FLOPS and 32TB of HBM 1 to be accessed by the application with very low latency and in lockstep. TPU is programmable via software frontends such as TensorFlow [1] or PyTorch [21], and can be deployed both for training huge deep neural networks and for performing low-latency online prediction. [14] reports impressive acceleration of training and online prediction.With the tremendous amount of computation resources that TPU offers, it is compelling to also consider the opportunities TPU brings for applications beyond machine learning. The programming frontends that are used for TPU, such as TensorFlow, also offer a rich set of functionalities that are highly relevant for scientific computations. The TensorFlow TPU programming stack also provides the additional benefits of allowing distributed algorithms to be expressed with simple and easy-to-understand code without sacrificing performance. In addition, the ability to program conventional scientific simulations in TensorFlow framework makes it easier to explore the hybrid approaches employing both conventional scientific computation methods and modern machine learning techniques on the same framework.Motivated by these observations, we developed a Single Instruc...