Optical encryption based on single-pixel imaging (SPI) has made great advances with the introduction of deep learning. However, the use of deep neural networks usually requires a long training time, and the networks need to be retrained once the target scene changes. With this in mind, we propose an SPI encryption scheme based on an attention-inserted physics-driven neural network. Here, an attention module is used to encrypt the single-pixel measurement value sequences of two images, together with a sequence of cryptographic keys, into a one-dimensional ciphertext signal to complete image encryption. Then, the encrypted signal is fed into a physics-driven neural network for high-fidelity decoding (i.e., decryption). This scheme eliminates the need for pre-training the network and gives more freedom to spatial modulation. Both simulation and experimental results have demonstrated the feasibility and eavesdropping resistance of this scheme. Thus, it will lead SPI-based optical encryption closer to intelligent deep encryption.