Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose Midas, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. Midas has the following properties: (a) it detects microcluster anomalies while providing theoretical guarantees about its false positive probability; (b) it is online, thus processing each edge in constant time and constant memory, and also processes the data 108–505 times faster than state-of-the-art approaches; (c) it provides 46%-52% higher accuracy (in terms of AUC) than state-of-the-art approaches.
Given a dynamic graph stream, how can we detect the sudden appearance of anomalous patterns, such as link spam, follower boosting, or denial of service attacks? Additionally, can we categorize the types of anomalies that occur in practice, and theoretically analyze the anomalous signs arising from each type? In this work, we propose AnomRank, an online algorithm for anomaly detection in dynamic graphs. AnomRank uses a twopronged approach defining two novel metrics for anomalousness. Each metric tracks the derivatives of its own version of a 'node score' (or node importance) function. This allows us to detect sudden changes in the importance of any node. We show theoretically and experimentally that the two-pronged approach successfully detects two common types of anomalies: sudden weight changes along an edge, and sudden structural changes to the graph. AnomRank is (a) Fast and Accurate: up to 49.5× faster or 35% more accurate than state-of-the-art methods, (b) Scalable: linear in the number of edges in the input graph, processing millions of edges within 2 seconds on a stock laptop/desktop, and (c) Theoretically Sound: providing theoretical guarantees of the two-pronged approach.
Given a time-evolving graph, how can we track similarity between nodes in a fast and accurate way, with theoretical guarantees on the convergence and the error? Random Walk with Restart (RWR) is a popular measure to estimate the similarity between nodes and has been exploited in numerous applications. Many real-world graphs are dynamic with frequent insertion/deletion of edges; thus, tracking RWR scores on dynamic graphs in an efficient way has aroused much interest among data mining researchers. Recently, dynamic RWR models based on the propagation of scores across a given graph have been proposed, and have succeeded in outperforming previous other approaches to compute RWR dynamically. However, those models fail to guarantee exactness and convergence time for updating RWR in a generalized form.In this paper, we propose OSP, a fast and accurate algorithm for computing dynamic RWR with insertion/deletion of nodes/edges in a directed/undirected graph. When the graph is updated, OSP first calculates offset scores around the modified edges, propagates the offset scores across the updated graph, and then merges them with the current RWR scores to get updated RWR scores. We prove the exactness of OSP and introduce OSP-T, a version of OSP which regulates a trade-off between accuracy and computation time by using error tolerance ϵ. Given restart probability c, OSP-T guarantees to return RWR scores with O(ϵ/c) error in O(log (1−c) ( ϵ 2 )) iterations. Through extensive experiments, we show that OSP tracks RWR exactly up to 4605× faster than existing static RWR method on dynamic graphs, and OSP-T requires up to 15× less time with 730× lower L1 norm error and 3.3× lower rank error than other state-of-the-art dynamic RWR methods.
Given a large graph, how can we determine similarity between nodes in a fast and accurate way? Random walk with restart (RWR) is a popular measure for this purpose and has been exploited in numerous data mining applications including ranking, anomaly detection, link prediction, and community detection. However, previous methods for computing exact RWR require prohibitive storage sizes and computational costs, and alternative methods which avoid such costs by computing approximate RWR have limited accuracy.In this paper, we propose TPA, a fast, scalable, and highly accurate method for computing approximate RWR on large graphs. TPA exploits two important properties in RWR: 1) nodes close to a seed node are likely to be revisited in following steps due to block-wise structure of many real-world graphs, and 2) RWR scores of nodes which reside far from the seed node are proportional to their PageRank scores. Based on these two properties, TPA divides approximate RWR problem into two subproblems called neighbor approximation and stranger approximation. In the neighbor approximation, TPA estimates RWR scores of nodes close to the seed based on scores of few early steps from the seed. In the stranger approximation, TPA estimates RWR scores for nodes far from the seed using their PageRank. The stranger and neighbor approximations are conducted in the preprocessing phase and the online phase, respectively. Through extensive experiments, we show that TPA requires up to 3.5× less time with up to 40× less memory space than other state-of-the-art methods for the preprocessing phase. In the online phase, TPA computes approximate RWR up to 30× faster than existing methods while maintaining high accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.