Digital empowerment of China’s power energy sector is a key factor in increasing its economic and social benefits, and named entity recognition technology is the most fundamental and core task of information extraction technology in the digital empowerment process. Therefore, we propose a multimodal named entity recognition model PE-MNER for power equipment based on deep neural networks. Compared to text multimodality, text and image multimodality can use image information to supplement missing information in the text, thus enabling more accurate entity extraction. The model first obtains a BERT neural network through incremental training, and then extracts Chinese character features through the network. Then, a hierarchical visual prefix fusion network is used to fuse image information. From the comparative experimental results, it can be seen that the proposed model has the best performance compared to the benchmark model, with an improvement of 4.08%∼7.20% in the F1 score compared to the benchmark model.