To effectively mine historical data information and improve the accuracy of short-term load prediction, this paper aims at the characteristics of time series and nonlinear power load. Deep learning for load forecasting has received a lot of attention in recent years, and it has become popular in the analysis of electricity load forecasting. Long short-term memory (LSTM) and gated recurrent unit (GRU) are specifically designed for time-series data. However, due to the gradient disappearing and exploding problem, recurrent neural networks (RNNs) cannot capture long-term dependence. The Transformer, a self-attention-based sequence model, has produced impressive results in a variety of generating tasks that demand long-range coherence. This shows that self-attention could be useful in power load forecasting modeling. In this paper, to effectively and efficiently model the large-scale load forecasting, we further design the transform encoder with relative position encoding, which consists of four main components: single-layer neural network, relative positional encoding module, encoder module, and feed-forward network. Experimental results on real-world datasets demonstrate that our method outperforms the GRU, LSTM, and original Transformer encoder.