Approximation Algorithms for Probabilistic k-Center Clustering

Alipour, Sharareh

doi:10.1109/icdm50108.2020.00009

Cited by 6 publications

(3 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If and only if p is a node, we do nothing, i.e., p is not deleted from the index. When we compute approximate local density, we check whether p ∈ P active , thus this non-deletion is not an issue 3 .…”

Section: B Index Updatementioning

confidence: 99%

“…Clustering is a primitive operator for data science, discovers patterns and events hidden in datasets, and supports data analysts in understanding the features of datasets. Therefore, clustering techniques in metric spaces have been studied in a wide range of fields, e.g., information retrieval [1], databases [2], data mining [3], artificial intelligence [4], and machine learning [5]. This paper considers density-peaks clustering (DPC) [6], one of the density-based clustering algorithms.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Scalable and Accurate Density-Peaks Clustering on Fully Dynamic Data

Amagata

2022

2022 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

Clustering is a primitive and important operator that analyzes a given dataset to discover its hidden patterns and features. Because datasets are usually updated dynamically (i.e., it accepts continuous insertions and arbitrary deletions), analyzing such dynamic data is also an important topic, and dynamic clustering effectively supports it, but is a challenging problem. In this paper, we consider the problem of densitypeaks clustering (DPC) on dynamic data. DPC is one of the density-based clustering algorithms and attracts attention for many applications, due to its effectiveness. We investigate the hardness of this problem theoretically to measure the efficiencies of dynamic DPC algorithms. We prove that any exact solutions are costly, and propose an approximation algorithm to enable faster updates. We conduct experiments on real datasets, and the results confirm that our algorithm is much faster and more accurate than state-of-the-art.

show abstract

Section: B Index Updatementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Scalable and Accurate Density-Peaks Clustering on Fully Dynamic Data

Amagata

2022

2022 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

show abstract

“…This section describes related work carried out in the area of unsupervised clustering, joint representation learning and clustering, and learning representation of imbalanced data. a) Clustering: Clustering has been broadly studied in machine learning in different aspects such as density-based clustering [5], distribution-based clustering [6], [7], grid-based clustering, distance-based clustering [8]- [11], grouping methods [12], [13]. One of the most popular clustering methods is K-means [14], which aims to partition the observation space into k clusters so that each observation belongs to the cluster with the nearest centroid.…”

Section: Related Workmentioning

confidence: 99%

Joint Debiased Representation Learning and Imbalanced Data Clustering

Rezaei¹,

Dorigatti²,

Ruegamer³

et al. 2021

Preprint

View full text Add to dashboard Cite

One of the most promising approaches for unsupervised learning is combining deep representation learning and deep clustering. Some recent works propose to simultaneously learn representation using deep neural networks and perform clustering by defining a clustering loss on top of embedded features. However, these approaches are sensitive to imbalanced data and out-of-distribution samples. As a consequence, these methods optimize clustering by pushing data close to randomly initialized cluster centers. This is problematic when the number of instances varies largely in different classes or a cluster with few samples has less chance to be assigned a good centroid. To overcome these limitations, we introduce StatDEC, a new unsupervised framework for joint statistical representation learning and clustering. In our framework, we simultaneously train two deep learning models, a deep statistics network that captures the data distribution, and a deep clustering network that learns embedded features and performs clustering by explicitly defining a clustering loss. Specifically, the clustering network and learning representation network both take advantage of our proposed statistics pooling layer that represents mean, variance, and cardinality to handle the out-of-distribution samples and class imbalance. Our experiments show that using these representations, one can considerably improve results on imbalanced image clustering across a variety of image datasets. Moreover, the learned representations generalize well when transferred to the out-of-distribution dataset.

show abstract

Improvements on approximation algorithms for clustering probabilistic data

Alipour

2021

Knowl Inf Syst

View full text Add to dashboard Cite

Approximation Algorithms for Probabilistic k-Center Clustering

Cited by 6 publications

References 27 publications

Scalable and Accurate Density-Peaks Clustering on Fully Dynamic Data

Scalable and Accurate Density-Peaks Clustering on Fully Dynamic Data

Joint Debiased Representation Learning and Imbalanced Data Clustering

Improvements on approximation algorithms for clustering probabilistic data

Contact Info

Product

Resources

About