We first explore methods for approximating the commute time and Katz score between a pair of nodes. These methods are based on the approach of matrices, moments, and quadrature developed in the numerical linear algebra community. They rely on the Lanczos process and provide upper and lower bounds on an estimate of the pair-wise scores. We also explore methods to approximate the commute times and Katz scores from a node to all other nodes in the graph. Here, our approach for the commute times is based on a variation of the conjugate gradient algorithm, and it provides an estimate of all the diagonals of the inverse of a matrix. Our technique for the Katz scores is based on exploiting an empirical localization property of the Katz matrix. We adopt algorithms used for personalized PageRank computing to these Katz scores and theoretically show that this approach is convergent. We evaluate these methods on 17 real world graphs ranging in size from 1000 to 1,000,000 nodes. Our results show that our pair-wise commute time method and column-wise Katz algorithm both have attractive theoretical properties and empirical performance.
Abstract. Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches typically approximate all pairwise relationships simultaneously. In this paper, we are interested in computing: the score for a single pair of nodes, and the top-k nodes with the best scores from a given source node. For the pairwise problem, we apply an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and a quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph and is related to techniques used in personalized PageRank computing. To test the scalability and accuracy of our algorithms we experiment with three real-world networks and find that these algorithms run in milliseconds to seconds without any preprocessing.
Evolutionary Algorithms are vastly used in development of rule based classifier systems in data mining where the rule base is usually a set of If-Then rules and an evolutionary trait develops and optimizes these rules. Genetic Algorithm is usually a favorite solution for such tasks as it globally searches for good rule-sets without any prior bias or greedy force, but it is usually slow. Also, designing a good genetic algorithm for rule base evolution requires the design of a recombination operator that merges two rule bases without disrupting the functionalities of each of them. To overcome the speed problem and the need to design recombination operator, this paper presents a novel algorithm for rule base evolution based on natural process of symbiogenesis. The algorithm uses symbiotic combination operator instead of traditional sexual recombination operator of genetic algorithms. This operator takes two chromosomes with different number of genes (rules here) and merges them by combining all the information content of both chromosomes. Using this operator results in two major advantages: First, it totally removes the need to design the recombination operator and therefore is easier to use; second, it outperforms traditional genetic algorithm both in emergence speed and classification rate, this is tested and presented on some globally used benchmarks.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.