Time Series Forecasting (TSF) is essential to key domains, and the transformer neural network has advanced the state-of-the-art on global, multi-horizon TSF benchmarks. The quadratic time and memory complexity of the vanilla transformer (VT) hinders its application to big data environments; therefore, multiple efficient variants of the VT that lower complexity via sparse self-attention have been proposed. However, less complex algorithms do not directly produce faster executions, and machine learning models for big data are typically trained on accelerators designed for dense-matrix computation that render slower performance with sparse matrices. To better compare the accuracyspeed trade-off of the VT and its variants, it is essential to test them on such accelerators. We implemented a cloud-based VT on Tensor Processing Units to address this task. Experiments on large-scale datasets show that our transformer outperforms two reference models on accuracy while reducing training times from hours to under two minutes.