Graph-based saliency detection has aroused considerable attention due to its ability to extract object of interest from the natural scenes. However, the most existing methods usually consisted the graph constrcution and initial back-gorund/foreground seeds selection, which may simplify the relationship among multi-view data and results in an inaccurate in complicated scenes. To our knowledge , the success of the graph-based saliency detection methods mainly rely on the qulaity of graph. In this paper, we target to achieve a joint similarity matrix learning (AJSML) from multi-view data based on graph diffusion process, which is committed to facilitating the RGB-T saliency detection task. Our assumption is that salient object in the complicaed scene always tend to be similarity appearance and compactness distribution in spatial domain. Specifically, we first design a generalized framework to simultaneously learn the high quality graph and similarity matrix for multi-view data. Thus, the similarity relationship and correlation information of multi-view data can be effectively diffused on the high quality graph, which is conducive to generating a faithful similarity matrix. Further, we present a post-processing technology called an adaptive weighted semi-supervised 1 learning (AWSL), integrating saliency information and cross-modality graphs, is developed to promote the accuracy degree of saliency results. Finally, extensive experimental results on well-known benchmark RGB-T, RGB, and RGB-D datasets demonstrate the superiority of the proposed method, in comparison to several state-of-the-art methods.