Given the growth of uncertainty in the real world, analysing probabilistic graphs is crucial. Clustering is one of the most fundamental methods of mining probabilistic graphs to discover the hidden patterns in them. This survey examines an extensive and organized analysis of the clustering techniques of large probabilistic graphs proposed in the literature. First, the definition of probabilistic graphs and modelling them are introduced. Second, the clustering of such graphs and their challenges, such as uncertainty of edges, high dimensions, and the impossibility of applying certain graph clustering techniques directly, are expressed. Then, a taxonomy of clustering approaches is discussed in two main categories: threshold-based and possible worlds-based methods. The techniques presented in each category are explained and examined. Here, these methods are evaluated on real datasets, and their performance is compared with each other. Finally, the survey is summarized by describing some of the applications of probabilistic graph clustering and future research directions.clustering, possible worlds-based methods, probabilistic graph, threshold-based methods
| INTRODUCTIONGraphs are powerful tools for modelling large interconnected data networks that are highly developed in the community and the surrounding natural world. Social networks, traffic networks, genome databases, and knowledge graphs are samples of common graphs in which entities are considered nodes, and the connections between them are considered the edges. However, the uncertainty in graph data is an undeniable problem created for various reasons such as noise in measurement methods (Aggarwal & Yu, 2009), vague and inaccurate information sources (Cheng et al., 2015), lack of the required accurate information (Zhu et al., 2015), inference and prediction models (Adar & R'e, 2007), and protecting private information (Boldi et al., 2012). In these cases, the data is represented as an uncertain or probabilistic graph in which uncertainty is assigned to each graph component as a probabilistic value. It means that the probabilistic graph can be due to probabilistic nodes, edges, attributes, or a combination of them (Khan et al., 2018).However, most of the existing probabilistic graphs were formed due to probabilistic edges. Each probabilistic edge indicates the possibility of a connection between two corresponding nodes. For example, in protein interactive networks, the interaction between the two proteins is inferred experimentally and is a probabilistic value. Alternatively, in social networks, complex relationships between users, such as trust or influence, are not directly available and are indirectly inferred from user activity (Li et al., 2021). Therefore, most recent studies in probabilistic graphs have focused on these types of graphs. Accordingly, each edge is labelled with the probability of the presence of that connection.Considering the increasing number of applications with uncertain data, mining probabilistic graphs as uncertain structured data wi...