Vision Transformer (ViT) provides new ideas for polarization synthetic aperture radar (PolSAR) image classification due to its advantages in learning global-spatial information. However, the lack of local-spatial information within samples and correlation information among samples, as well as the complexity of network structure, limit the application of ViT in practice. In addition, dual-frequency PolSAR data provides rich information, but there are fewer related studies compared to single-frequency classification algorithms. In this paper, we adopt ViT as the basic framework, and propose a novel model based on mixed patch interaction for dual-frequency PolSAR image adaptive fusion classification (PolSAR-MPIformer). First, a mixed patch interaction (MPI) module is designed for feature extraction, which replaces the high-complexity self-attention in ViT with patch interaction intra-and inter-sample. Besides the global-spatial information learning within samples by ViT, the MPI module adds the learning of local-spatial information within samples and correlation information among samples, thereby obtaining more discriminative features through a low-complexity network. Subsequently, a dual-frequency adaptive fusion (DAF) module is constructed as the classifier of PolSAR-MPIformer. On the one hand, the attention mechanism is utilized in DAF to reduce the impact of speckle noise while preserving details. On the other hand, the DAF evaluates the classification confidence of each band and assigns different weights accordingly, which achieves reasonable utilization of the complementarity between dual-frequency data and improves classification accuracy. Experiments on four real dual-frequency PolSAR datasets substantiate the superiority of the proposed PolSAR-MPIformer over other state-of-the-art algorithms.