Paolo Pellizzoni scite author profile

Metric k-center clustering is a fundamental unsupervised learning primitive. Although widely used, this primitive is heavily affected by noise in the data, so a more sensible variant seeks for the best solution that disregards a given number z of points of the dataset, which are called outliers. We provide efficient algorithms for this important variant in the streaming model under the sliding window setting, where, at each time step, the dataset to be clustered is the window W of the most recent data items. For general metric spaces, our algorithms achieve O1 approximation and, remarkably, require a working memory linear in k+z and only logarithmic in |W|. For spaces of bounded doubling dimension, the approximation can be made arbitrarily close to 3. For these latter spaces, we show, as a by-product, how to estimate the effective diameter of the window W, which is a measure of the spread of the window points, disregarding a given fraction of noisy distances. We also provide experimental evidence of the practical viability of the improved clustering and diameter estimation algorithms.

show abstract

Adaptive k-center and diameter estimation in sliding windows

Pellizzoni

Pietracaprina

Pucci

2022

Int J Data Sci Anal

View full text Add to dashboard Cite

In this paper we present novel streaming algorithms for the k-center and the diameter estimation problems for general metric spaces under the sliding window model. The key idea behind our algorithms is to maintain a small coreset which, at any time, allows to compute a solution to the problem under consideration for the current window, whose quality can be made arbitrarily close to the one of the best solution attainable by running a polynomial-time sequential algorithm on the entire window. Remarkably, the size of our coresets is independent of the window length and can be upper bounded by a function of the target number of centers (for the k-center problem), of the desired accuracy, and of the characteristics of the current window, namely its doubling dimension and aspect ratio. One of the major strengths of our algorithms is that they adapt obliviously to these two latter characteristics. We also provide experimental evidence of the practical viability of the algorithms and their superiority over the current state of the art.

show abstract

Dimensionality reduction on complex vector spaces for dynamic weighted Euclidean distance

Pellizzoni¹,

Silvestri²

2022

Preprint

View full text Add to dashboard Cite

Fully Dynamic Clustering and Diversity Maximization in Doubling Metrics

Pellizzoni

Pietracaprina

Pucci

2023

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.