Approximating Spectral Clustering via Sampling: A Review

Tremblay, Nicolas; Loukas, Andreas

doi:10.1007/978-3-030-29349-9_5

Cited by 38 publications

(28 citation statements)

References 128 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The cluster assignments are disregarded in this case because k typically exceeds the actual number of classes or unique labels. Solving this optimization problem is NP-hard and the Lloyd-Max algorithm is a popular iterative technique for approximating the optimal solution [32]. However, this algorithm requires the entire data set X to be stored in the main memory and incurs a high computational cost, taking O(npm) time per iteration.…”

Section: Preliminaries and Landmark Selection Techniquesmentioning

confidence: 99%

Kernel Matrix Approximation on Class-Imbalanced Data With an Application to Scientific Simulation

2021

View full text Add to dashboard Cite

Generating low-rank approximations of kernel matrices that arise in nonlinear machine learning techniques holds the potential to significantly alleviate the memory and computational burdens. A compelling approach centers on finding a concise set of exemplars or landmarks to reduce the number of similarity measure evaluations from quadratic to linear concerning the data size. However, a key challenge is to regulate tradeoffs between the quality of landmarks and resource consumption. Despite the volume of research in this area, current understanding is limited regarding the performance of landmark selection techniques in the presence of class-imbalanced data sets that are becoming increasingly prevalent in many applications. Hence, this paper provides a comprehensive empirical investigation using several realworld imbalanced data sets, including scientific data, by evaluating the quality of approximate low-rank decompositions and examining their influence on the accuracy of downstream tasks. Furthermore, we present a new landmark selection technique called Distance-based Importance Sampling and Clustering (DISC), in which the relative importance scores are computed for improving accuracy-efficiency tradeoffs compared to existing works that range from probabilistic sampling to clustering methods. The proposed landmark selection method follows a coarse-to-fine strategy to capture the intrinsic structure of complex data sets, allowing us to substantially reduce the computational complexity and memory footprint with minimal loss in accuracy.

show abstract

Section: Preliminaries and Landmark Selection Techniquesmentioning

confidence: 99%

Kernel Matrix Approximation on Class-Imbalanced Data With an Application to Scientific Simulation

2021

View full text Add to dashboard Cite

show abstract

“…Various methods have been proposed to accelerate spectral clustering by computing an approximate spectral embedding of the original data. Recent work [23] presented an excellent review of the literature on this topic for interested readers. In this paper, we divide the related work into two main categories: (1) methods that circumvent the computation of the full kernel matrix, and (2) techniques that consider the similarity graph as one of the inputs to spectral clustering and, thus, ignore the cost associated with step 2 of Alg.…”

Section: A Related Work On Accelerating Spectral Clusteringmentioning

confidence: 99%

Scalable Spectral Clustering With Nyström Approximation: Practical and Theoretical Aspects

Pourkamali-Anaraki

2020

IEEE Open J. Signal Process.

View full text Add to dashboard Cite

Spectral clustering techniques are valuable tools in signal processing and machine learning for partitioning complex data sets. The effectiveness of spectral clustering stems from constructing a non-linear embedding based on creating a similarity graph and computing the spectral decomposition of the Laplacian matrix. However, spectral clustering methods fail to scale to large data sets because of high computational cost and memory usage. A popular approach for addressing these problems utilizes the Nyström method, an efficient sampling-based algorithm for computing low-rank approximations to large positive semi-definite matrices. This paper demonstrates how the previously popular approach of Nyström-based spectral clustering has severe limitations. Existing time-efficient methods ignore critical information by prematurely reducing the rank of the similarity matrix associated with sampled points. Also, current understanding is limited regarding how utilizing the Nyström approximation will affect the quality of spectral embedding approximations. To address the limitations, this work presents a principled spectral clustering algorithm that exploits spectral properties of the similarity matrix associated with sampled points to regulate accuracy-efficiency trade-offs. We provide theoretical results to reduce the current gap and present numerical experiments with real and synthetic data. Empirical results demonstrate the efficacy and efficiency of the proposed method compared to existing spectral clustering techniques based on the Nyström method and other efficient methods. The overarching goal of this work is to provide an improved baseline for future research directions to accelerate spectral clustering.

show abstract

“…Beyond SC, many other methods have been proposed, such as Maximum Likelihood or variational approaches, which are consistent for the SBM and DSBM [6,30,29], Bayesian approaches [49], learning-based approaches [2], or neural networks [5]. Many variants of the SC itself exist, often to accelerate computation [41]. We focus here on the traditional SC.…”

Section: Introductionmentioning

confidence: 99%

Sparse and smooth: Improved guarantees for spectral clustering in the dynamic stochastic block model

Keriven

Vaiter²

2022

Electron. J. Statist.

View full text Add to dashboard Cite

In this paper, we analyze classical variants of the Spectral Clustering (SC) algorithm in the Dynamic Stochastic Block Model (DSBM). Existing results show that, in the relatively sparse case where the expected degree grows logarithmically with the number of nodes, guarantees in the static case can be extended to the dynamic case and yield improved error bounds when the DSBM is sufficiently smooth in time, that is, the communities do not change too much between two time steps. We improve over these results by drawing a new link between the sparsity and the smoothness of the DSBM: the smoother the DSBM is, the more sparse it can be, while still guaranteeing consistent recovery. In particular, a mild condition on the smoothness allows to treat the sparse case with bounded degree. These guarantees are valid for the SC applied to the adjacency matrix or the normalized Laplacian. As a by-product of our analysis, we obtain to our knowledge the best spectral concentration bound available for the normalized Laplacian of matrices with independent Bernoulli entries.

show abstract

Approximating Spectral Clustering via Sampling: A Review

Cited by 38 publications

References 128 publications

Kernel Matrix Approximation on Class-Imbalanced Data With an Application to Scientific Simulation

Kernel Matrix Approximation on Class-Imbalanced Data With an Application to Scientific Simulation

Scalable Spectral Clustering With Nyström Approximation: Practical and Theoretical Aspects

Sparse and smooth: Improved guarantees for spectral clustering in the dynamic stochastic block model

Contact Info

Product

Resources

About