The problem of max-kernel search arises everywhere: given a query point pq, a set of reference objects Sr and some kernel K, find arg max pr ∈Sr K(pq, pr). Max-kernel search is ubiquitous and appears in countless domains of science, thanks to the wide applicability of kernels. A few domains include image matching, information retrieval, bio-informatics, similarity search, and collaborative filtering (to name just a few). However, there are no generalized techniques for efficiently solving max-kernel search. This paper presents a single-tree algorithm called single-tree FastMKS which returns the max-kernel solution for a single query point in provably O(log N ) time (where N is the number of reference objects), and also a dual-tree algorithm (dual-tree FastMKS) which is useful for max-kernel search with many query points. If the set of query points is of size O(N ), this algorithm returns a solution in provably O(N ) time, which is significantly better than the O(N 2 ) linear scan solution; these bounds are dependent on the expansion constant of the data. These algorithms work for abstract objects, as they do not require explicit representation of the points in kernel space. Empirical results for a variety of datasets show up to 5 orders of magnitude speedup in some cases. In addition, we present approximate extensions of the FastMKS algorithms that can achieve further speedups.
Max-kernel searchOne particularly ubiquitous problem in computer science is that of max-kernel search: for a given set S r of N objects (the reference set), a similarity function K(·, ·), and a query object p q , find the object p r ∈ R such thatOften, max-kernel search is performed for a large set of query objects S q .The most simple approach to this general problem is a linear scan over all the objects in S r . However, the computational cost of this approach scales linearly with the size of the reference set for a single query, making it computationally prohibitive for large datasets. If |S q | = |S r | = O(N ), then this approach scales as O(N 2 ); thus, the approach quickly becomes infeasible for large N .In our setting we restrict the similarity function K(·, ·) to be a Mercer kernel. As we will see, this is not very restrictive. A Mercer kernel is a positive semidefinite kernel function; these can be expressed as an inner product in some Hilbert space H: