Mid-infrared photoacoustic microscopy can capture biochemical information without staining. However, the long mid-infrared optical wavelengths make the spatial resolution of photoacoustic microscopy significantly poorer than that of conventional confocal fluorescence microscopy. Here, we demonstrate an explainable deep learning-based unsupervised inter-domain transformation of low-resolution unlabeled mid-infrared photoacoustic microscopy images into confocal-like virtually fluorescence-stained high-resolution images. The explainable deep learning-based framework is proposed for this transformation, wherein an unsupervised generative adversarial network is primarily employed and then a saliency constraint is added for better explainability. We validate the performance of explainable deep learning-based mid-infrared photoacoustic microscopy by identifying cell nuclei and filamentous actins in cultured human cardiac fibroblasts and matching them with the corresponding CFM images. The XDL ensures similar saliency between the two domains, making the transformation process more stable and more reliable than existing networks. Our XDL-MIR-PAM enables label-free high-resolution duplexed cellular imaging, which can significantly benefit many research avenues in cell biology.