Fast algorithms for comprehensive n-point correlation estimates

March, William B.; Connolly, Andrew J.; Gray, Alexander G.

doi:10.1145/2339530.2339761

Cited by 6 publications

(4 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is a huge improvement over the linear scan all-nearest-neighbors runtime of O(N 2 ). Similar runtime bound results have been shown for some dualtree algorithms when cover trees are chosen as the tree type [58,45,46].…”

Section: Speedups Via Treessupporting

confidence: 76%

“…More recently, Gray and Moore proposed using a second tree for problems with large query sets [25], such as all-nearest-neighbors and density estimation [26]. This dual-tree approach was then applied to numerous problems: singular value decomposition [28], n-point correlation estimates in astronomy [45], mean shift [63], kernel summation [39,41,40], rankapproximate nearest neighbor search [58], and minimum spanning tree calculation [46], as well as numerous others.…”

Section: Speedups Via Treesmentioning

confidence: 99%

See 1 more Smart Citation

Dual‐tree fast exact max‐kernel search

Curtin

Ram

2014

Statistical Analysis

View full text Add to dashboard Cite

The problem of max-kernel search arises everywhere: given a query point pq, a set of reference objects Sr and some kernel K, find arg max pr ∈Sr K(pq, pr). Max-kernel search is ubiquitous and appears in countless domains of science, thanks to the wide applicability of kernels. A few domains include image matching, information retrieval, bio-informatics, similarity search, and collaborative filtering (to name just a few). However, there are no generalized techniques for efficiently solving max-kernel search. This paper presents a single-tree algorithm called single-tree FastMKS which returns the max-kernel solution for a single query point in provably O(log N ) time (where N is the number of reference objects), and also a dual-tree algorithm (dual-tree FastMKS) which is useful for max-kernel search with many query points. If the set of query points is of size O(N ), this algorithm returns a solution in provably O(N ) time, which is significantly better than the O(N 2 ) linear scan solution; these bounds are dependent on the expansion constant of the data. These algorithms work for abstract objects, as they do not require explicit representation of the points in kernel space. Empirical results for a variety of datasets show up to 5 orders of magnitude speedup in some cases. In addition, we present approximate extensions of the FastMKS algorithms that can achieve further speedups. Max-kernel searchOne particularly ubiquitous problem in computer science is that of max-kernel search: for a given set S r of N objects (the reference set), a similarity function K(·, ·), and a query object p q , find the object p r ∈ R such thatOften, max-kernel search is performed for a large set of query objects S q .The most simple approach to this general problem is a linear scan over all the objects in S r . However, the computational cost of this approach scales linearly with the size of the reference set for a single query, making it computationally prohibitive for large datasets. If |S q | = |S r | = O(N ), then this approach scales as O(N 2 ); thus, the approach quickly becomes infeasible for large N .In our setting we restrict the similarity function K(·, ·) to be a Mercer kernel. As we will see, this is not very restrictive. A Mercer kernel is a positive semidefinite kernel function; these can be expressed as an inner product in some Hilbert space H:

show abstract

Section: Speedups Via Treessupporting

confidence: 76%

Section: Speedups Via Treesmentioning

confidence: 99%

Dual‐tree fast exact max‐kernel search

Curtin

Ram

2014

Statistical Analysis

View full text Add to dashboard Cite

show abstract

“…There exist numerous dual-tree algorithms for problems as diverse as kernel density estimation (Gray and Moore, 2003), mean shift (Wang et al, 2007), minimum spanning tree calculation (March et al, 2010), n-point correlation function estimation (March et al, 2012), max-kernel search (Curtin et al, 2013c), particle smoothing (Klaas et al, 2006), variational inference (Amizadeh et al, 2012), range search (Gray and Moore, 2001), and embedding techniques Van Der Maaten (2014), to name a few.…”

Section: Dual-tree Algorithmsmentioning

confidence: 99%

Plug-and-play dual-tree algorithm runtime analysis

Curtin¹,

Lee²,

March³

et al. 2015

Preprint

Self Cite

View full text Add to dashboard Cite

Numerous machine learning algorithms contain pairwise statistical problems at their corethat is, tasks that require computations over all pairs of input points if implemented naively. Often, tree structures are used to solve these problems efficiently. Dual-tree algorithms can efficiently solve or approximate many of these problems. Using cover trees, rigorous worstcase runtime guarantees have been proven for some of these algorithms. In this paper, we present a problem-independent runtime guarantee for any dual-tree algorithm using the cover tree, separating out the problem-dependent and the problem-independent elements. This allows us to just plug in bounds for the problem-dependent elements to get runtime guarantees for dual-tree algorithms for any pairwise statistical problem without re-deriving the entire proof. We demonstrate this plug-and-play procedure for nearest-neighbor search and approximate kernel density estimation to get improved runtime guarantees. Under mild assumptions, we also present the first linear runtime guarantee for dual-tree based range search.

show abstract

“…tion, important in astrophysics, is an n-body problem and can be solved quickly with trees (March et al, 2012). In addition, Euclidean minimum spanning trees can be found quickly using tree-based algorithms (March et al, 2010).…”

Section: Introductionmentioning

confidence: 99%

Tree-Independent Dual-Tree Algorithms

Curtin,

March,

Ram

et al. 2013

Preprint

Self Cite

View full text Add to dashboard Cite

Dual-tree algorithms are a widely used class of branch-and-bound algorithms. Unfortunately, developing dual-tree algorithms for use with different trees and problems is often complex and burdensome. We introduce a four-part logical split: the tree, the traversal, the point-to-point base case, and the pruning rule. We provide a meta-algorithm which allows development of dual-tree algorithms in a tree-independent manner and easy extension to entirely new types of trees. Representations are provided for five common algorithms; for k-nearest neighbor search, this leads to a novel, tighter pruning bound. The meta-algorithm also allows straightforward extensions to massively parallel settings.

show abstract

Fast algorithms for comprehensive n-point correlation estimates

Cited by 6 publications

References 23 publications

Dual‐tree fast exact max‐kernel search

Dual‐tree fast exact max‐kernel search

Plug-and-play dual-tree algorithm runtime analysis

Tree-Independent Dual-Tree Algorithms

Contact Info

Product

Resources

About