Abstract-This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markov-chain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the "length" of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commute-time distance. This graph PCA provides a nice interpretation to the "Fiedler vector," widely used for graph partitioning. The model is evaluated on a collaborativerecommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacian-based similarities perform well in comparison with other methods. The model, which nicely fits into the so-called "statistical relational learning" framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machine-learning and pattern-recognition tasks involving a relational database.
This paper discusses the trade-off between accuracy, reliability and computing time in global optimization. Particular compromises provided by traditional methods (Quasi-Newton and Nelder-Mead's simplex methods) and genetic algorithms are addressed and illustrated by a particular application in the field of nonlinear system identification. Subsequently, new hybrid methods are designed, combining principles from genetic algorithms and "hill-climbing" methods in order to find a better compromise to the trade-off. Inspired by biology and especially by the manner in which living beings adapt themselves to their environment, these hybrid methods involve two interwoven levels of optimization, namely evolution (genetic algorithms) and individual learning (Quasi-Newton), which cooperate in a global process of optimization. One of these hybrid methods appears to join the group of state-of-the-art global optimization methods: it combines the reliability properties of the genetic algorithms with the accuracy of Quasi-Newton method, while requiring a computation time only slightly higher than the latter.
We introduce a novel family of contextual measures of similarity between distributions: the similarity between two distributions q and p is measured in the context of a third distribution u. In our framework any traditional measure of similarity / dissimilarity has its contextual counterpart. We show that for two important families of divergences (Bregman and Csiszár), the contextual similarity computation consists in solving a convex optimization problem. We focus on the case of multinomials and explain how to compute in practice the similarity for several well-known measures.These contextual measures are then applied to the image retrieval problem. In such a case, the context u is estimated from the neighbors of a query q. One of the main benefits of our approach lies in the fact that using different contexts, and especially contexts at multiple scales (i.e. broad and narrow contexts), provides different views on the same problem. Combining the different views can improve retrieval accuracy. We will show on two very different datasets (one of photographs, the other of document images) that the proposed measures have a relatively small positive impact on macro Average Precision (which measures purely ranking) and a large positive impact on micro Average Precision (which measures both ranking and consistency of the scores across multiple queries). * Yan Liu is a Ph.D. student in the Laboratoire d'Informatique en Image et Systmes d'Information (LIRIS) at the Ecole Centrale de Lyon (ECL).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.