Accurate load forecasting is essential for ensuring safe, stable, and economical operation of energy internet. Temporal convolutional networks (TCNs) have demonstrated superior performance, when compared to recurrent neural network models, since their introduction in electrical load forecasting. However, the current TCN‐based models are unable to obtain a large receptive field and strong long‐time feature extraction capability owing to the specific kernel size of the 1D convolution structure. This paper proposes a temporal inception convolutional network based on multi‐head attention (TICN‐Att) for ultra‐short‐term load prediction. By introducing an inception structure into the TCN, the proposed model can extract multi‐dimensional information from the input features, through the multiple hidden convolutional kernels of different scales, without stacking layers depth‐wise. Simultaneously, by introducing a multi‐head attention mechanism, the TICN‐Att model has a long time‐dependent extraction capability, similar to that of the long short‐term memory network models. The generalization and validity of the model are tested using the global energy forecasting competition (GEFCOM2014) dataset, electrical load data of a city in Jiangsu (China), and PJM power system dataset. The experimental results demonstrated that the proposed model has the best prediction effect, compared to the other state‐of‐the‐art models.