Recently, long short-term memory (LSTM) networks have been widely adopted to help with fault diagnosis for power systems. However, the parameters of LSTM networks are determined by prior knowledge and experience and thereby not capable of dealing with unexpected faults in volatile environments. In this paper, we propose and apply an improved grey wolf optimization (IGWO) algorithm to optimize the parameters of LSTM networks, aiming to circumvent the drawback of empirical LSTM parameters and enhance the fault diagnosis accuracy for on-load tap changers (OLTCs). The composite multiscale weighted permutation entropy and energy entropy yielded by the grasshopper optimization algorithm and variational mode decomposition (GOA-VMD) method are used as the inputs of LSTM networks. The IGWO algorithm is applied in an iterative manner to optimize the relevant super arithmetic of the LSTM. In this way, an IGWO-LSTM combination model is constructed to classify different faults diagnosed in OLTCs. Experimental results verify the diagnosis performance superiority of the proposed method over several widely used comparison benchmarks