Network information mining is the study of the network topology, which may answer a large number of application-based questions towards the structural evolution and the function of a real system. The question can be related to how the real system evolves or how individuals interact with each other in social networks. Although the evolution of the real system may seem to be found regularly, capturing patterns on the whole process of evolution is not trivial. Link prediction is one of the most important technologies in network information mining, which can help us understand the evolution mechanism of real-life network. Link prediction aims to uncover missing links or quantify the likelihood of the emergence of nonexistent links from known network structures. Currently, widely existing methods of link prediction almost focus on short-path networks that usually have a myriad of close triangular structures. However, these algorithms on highly sparse or long-path networks have poor performance. Here, we propose a new index that is associated with the principles of Structural Equivalence and Shortest Path Length (SESPL) to estimate the likelihood of link existence in long-path networks. Through 548 real networks test, we find that SESPL is more effective and efficient than other similarity-based predictors in long-path networks. Meanwhile, we also exploit the performance of SESPL predictor and of embedding-based approaches via machine learning techniques. The results show that the performance of SESPL can achieve a gain of 44.09% over GraphWave and 7.93% over Node2vec. Finally, according to the matrix of Maximal Information Coefficient (MIC) between all the similarity-based predictors, SESPL is a new independent feature in the space of traditional similarity features.
Many spreading processes in our real-life can be considered as a complex contagion, and the linear threshold (LT) model is often applied as a very representative model for this mechanism. Despite its intensive usage, the LT model suffers several limitations in describing the time evolution of the spreading. First, the discrete-time step that captures the speed of the spreading is vaguely defined. Second, the synchronous updating rule makes the nodes infected in batches, which cannot take individual differences into account. Finally, the LT model is incompatible with existing models for the simple contagion. Here, we consider a generalized linear threshold (GLT) model for the continuous-time stochastic complex contagion process that can be efficiently implemented by the Gillespie algorithm. The time in this model has a clear mathematical definition, and the updating order is rigidly defined. We find that the traditional LT model systematically underestimates the spreading speed and the randomness in the spreading sequence order. We also show that the GLT model works seamlessly with the susceptible-infected or susceptible-infected-recovered model. One can easily combine them to model a hybrid spreading process in which simple contagion accumulates the critical mass for the complex contagion that leads to the global cascades. Overall, the GLT model we proposed can be a useful tool to study complex contagion, especially when studying the time evolution of the spreading.
Most real-world systems evolve over time in which entities and the interactions between entities are added and removed---new entities or relationships appear and old entities or relationships vanish. While most network evolutionary models can provide an iterative process for constructing global properties, they cannot capture the evolutionary mechanisms of real systems. Link prediction is hence proposed to predict future links which also can help us understand the evolution law of real systems. The aim of link prediction is to uncover missing links from known parts of the network or quantify the likelihood of the emergence of future links from current structures of the network. However, almost all existing studies ignored that old nodes tend to disappear and new nodes appear over time in real networks, especially in social networks. It is more challenging for link prediction since the new nodes do not have pre-existing structure information. To solve the temporal link prediction problems with new nodes, here we take into account nodal Attribute Similarity and the Shortest Path Length, namely, $ASSPL$, to predict future links with new nodes. The results tested on scholar social network and academic funding networks show that it is highly effective and applicable for $ASSPL$ in funding networks with time-evolving. Meanwhile, we make full use of an efficient parameter to exploit how network structure or nodal attribute has an impact on the performance of temporal link prediction. Finally, we find that nodal attributes and network structure complement each other well for predicting future links with new nodes in funding networks.
The co-occurrence association is widely observed in many empirical data. Mining the information in co-occurrence data is essential for advancing our understanding of systems such as social networks, ecosystem, and brain network. Measuring similarity of entities is one of the important tasks, which can usually be achieved using a network-based approach. Here we show that traditional methods based on the aggregated network can bring unwanted in-directed relationship. To cope with this issue, we propose a similarity measure based on the ego network of each entity, which effectively considers the change of an entity's centrality from one ego network to another. The index proposed is easy to calculate and has a clear physical meaning. Using two different data sets, we compare the new index with other existing ones. We find that the new index outperforms the traditional network-based similarity measures, and it can sometimes surpass the embedding method. In the meanwhile, the measure by the new index is weakly correlated with those by other methods, hence providing a different dimension to quantify similarities in co-occurrence data. Altogether, our work makes an extension in the network-based similarity measure and can be potentially applied in several related tasks. The co-occurrence data refer to the type of data where multiple entities simultaneously occur in a single instance, such as the co-tags in folksonomy, the co-author of a scientific paper, co-activation of brain regions under a stimulus, and more. Measuring similarity between entities is fundamental to analyze co-occurrence data, allowing us to further explore social, brain or scientific systems. Using the ego network composed by the co-occurrence relationships as the backbone, we proposed a network-based similarity measure. The new approach outperforms traditional ones and can sometimes surpass the machine learning based embedding method, providing a good tool for tasks such as community detection, link prediction, recommendation. I. INTRODUCTION Many tasks in computer science, such as knowledge management 1,2 , community detection 3,4 , nature language processing 5,6 and link prediction 7,8 , require the measure of similarity between two entities. This can be achieved via different methods based on the nature of the problem analyzed. The similarity would be most straightforward to calculate if the features of the two entities are already mapped into a high dimensional space. Nevertheless, the embedding itself is usually a hard problem and in many cases without a clear physical explanation. Hence, other methods that do not directly use feature vectors are also widely used because of their simplicity and interpretability. For example, if two entities can be expressed by a string, their similarity can be quantified by the minimum number of operations required to
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.