Constant approximation for k-median and k-means with outliers via iterative rounding

Krishnaswamy, Ravishankar; Shi, Li; Sandeep, Sai

doi:10.1145/3188745.3188882

Cited by 72 publications

(116 citation statements)

References 52 publications

Supporting

Mentioning

115

Contrasting

Unclassified

Order By: Relevance

“…Since the power of parameterized algorithms for uncapacitated clustering is well understood, it is a natural question to understand the "capacitated VS uncapacitated question" in the FPT setting. Since clustering is a universal task, like capacitated versions, many variants of clustering tasks have been studied including k-MEDIAN/k-MEANS WITH OUTLIERS [193] and MATROID/KNAPSACK MEDIAN [194]. While no variant is proved to harder than the basic versions, it would be interesting to see whether they all have the same parameterized approximability with the basic versions.…”

Section: Capacitated Clustering and Other Variantsmentioning

confidence: 99%

A Survey on Approximation in Parameterized Complexity: Hardness and Algorithms

et al. 2020

View full text Add to dashboard Cite

show abstract

Section: Capacitated Clustering and Other Variantsmentioning

confidence: 99%

A Survey on Approximation in Parameterized Complexity: Hardness and Algorithms

et al. 2020

View full text Add to dashboard Cite

show abstract

“…With this classification model (algorithm), the data objects in the same cluster become more similar compared to the data objects in the other clusters. Meanwhile, the individual centroid of each cluster and the sum of squares of distances between data objects are used to create a cost function for the minimization task that will be repeated to classify and assign every data object to a certain cluster [ 5 , 14 , 15 , 16 , 17 , 46 , 47 , 48 , 49 , 50 , 51 , 52 ]. The K-means algorithm is a clustering technique to classify input data into K clusters based on unsupervised learning.…”

Section: Related Researchmentioning

confidence: 99%

A Novel Model on Reinforce K-Means Using Location Division Model and Outlier of Initial Value for Lowering Data Cost

Jung

Lee

2020

Entropy

View full text Add to dashboard Cite

Today, semi-structured and unstructured data are mainly collected and analyzed for data analysis applicable to various systems. Such data have a dense distribution of space and usually contain outliers and noise data. There have been ongoing research studies on clustering algorithms to classify such data (outliers and noise data). The K-means algorithm is one of the most investigated clustering algorithms. Researchers have pointed out a couple of problems such as processing clustering for the number of clusters, K, by an analyst through his or her random choices, producing biased results in data classification through the connection of nodes in dense data, and higher implementation costs and lower accuracy according to the selection models of the initial centroids. Most K-means researchers have pointed out the disadvantage of outliers belonging to external or other clusters instead of the concerned ones when K is big or small. Thus, the present study analyzed problems with the selection of initial centroids in the existing K-means algorithm and investigated a new K-means algorithm of selecting initial centroids. The present study proposed a method of cutting down clustering calculation costs by applying an initial center point approach based on space division and outliers so that no objects would be subordinate to the initial cluster center for dependence lower from the initial cluster center. Since data containing outliers could lead to inappropriate results when they are reflected in the choice of a center point of a cluster, the study proposed an algorithm to minimize the error rates of outliers based on an improved algorithm for space division and distance measurement. The performance experiment results of the proposed algorithm show that it lowered the execution costs by about 13–14% compared with those of previous studies when there was an increase in the volume of clustering data or the number of clusters. It also recorded a lower frequency of outliers, a lower effectiveness index, which assesses performance deterioration with outliers, and a reduction of outliers by about 60%.

show abstract

“…Gupta 等 [88] 在可以违反异常点数量限制的条件下, 基于局部搜索技术给出了一个双标准的 O(1)-近似算法. Friggstad 等 [89] 利用局部搜索提出了双准则 PTAS: 聚类中心有 k(1 + ϵ) 个, 针对 Euclid 空间和加倍度量空间近似比为 1 + ϵ, 针对一般度量空间近似比为 25 + ϵ. Krishnaswamy 等 [90] 给出了基于迭代线性规划舍入技术的 (53.002 + ϵ)-近似算法, 这是该问题的第一个常数近似比算法. Krishnaswamy 等 [90] 的算法思想如下: 由于带异常点 k-均值问题的自然线性规划松弛的整数间隙无界, 他们先把线性规划松弛的解舍入为张冬梅等: k-均值问题的理论与算法综述费用损失很少的几乎整数解, 在该解中至多有两个分数开设的中心; 由此可知, 线性规划整数间隙来自于几乎整数解和完全整数解的间隙.…”

Section: 鲁棒 K K K-均值问题unclassified