Abstract. Learning the visual representation for medical images is a critical task in computer-aided diagnosis. In this paper, we propose Unsupervised Multimodal Graph Mining (UMGM) to learn the discriminative features for probe-based confocal laser endomicroscopy (pCLE) mosaics of breast tissue. We build a multiscale multimodal graph based on both pCLE mosaics and histology images. The positive pairs are mined via cycle consistency and the negative pairs are extracted based on geodetic distance. Given the positive and negative pairs, the latent feature space is discovered by reconstructing the similarity between pCLE and histology images. Experiments on a database with 700 pCLE mosaics demonstrate that the proposed method outperforms previous works on pCLE feature learning. Specially, the top-1 accuracy in an eight-class retrieval task is 0.659 which leads to 10% improvement compared with the state-of-the-art method.