k-means requires exponentially many iterations even in the plane

Vattani, Andrea

doi:10.1145/1542362.1542419

Cited by 69 publications

(16 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The second step identifies different classes inside the extreme outlier set. This is done by a k -means algorithm [35, 36]. The algorithm permits to classify all the elements of the outlier set in one of the k classes.…”

Section: Models and Methodsmentioning

confidence: 99%

HacDivSel: Two new methods (haplotype-based and outlier-based) for the detection of divergent selection in pairs of populations

Carvajal-Rodríguez

2017

PLoS ONE

View full text Add to dashboard Cite

The detection of genomic regions involved in local adaptation is an important topic in current population genetics. There are several detection strategies available depending on the kind of genetic and demographic information at hand. A common drawback is the high risk of false positives. In this study we introduce two complementary methods for the detection of divergent selection from populations connected by migration. Both methods have been developed with the aim of being robust to false positives. The first method combines haplotype information with inter-population differentiation (FST). Evidence of divergent selection is concluded only when both the haplotype pattern and the FST value support it. The second method is developed for independently segregating markers i.e. there is no haplotype information. In this case, the power to detect selection is attained by developing a new outlier test based on detecting a bimodal distribution. The test computes the FST outliers and then assumes that those of interest would have a different mode. We demonstrate the utility of the two methods through simulations and the analysis of real data. The simulation results showed power ranging from 60–95% in several of the scenarios whilst the false positive rate was controlled below the nominal level. The analysis of real samples consisted of phased data from the HapMap project and unphased data from intertidal marine snail ecotypes. The results illustrate that the proposed methods could be useful for detecting locally adapted polymorphisms. The software HacDivSel implements the methods explained in this manuscript.

show abstract

Section: Models and Methodsmentioning

confidence: 99%

HacDivSel: Two new methods (haplotype-based and outlier-based) for the detection of divergent selection in pairs of populations

Carvajal-Rodríguez

2017

PLoS ONE

View full text Add to dashboard Cite

show abstract

“…The k-means is a popular objective function used for clustering problems in modern data science applications, such as computer vision, machine learning, and computational geometry (Drineas et al 2004;Little and Jones 2011;Vattani 2011). It is originally proposed by Forgy (1965) and MacQueen (1967) and is often known as Lloyd's algorithm (Lloyd 1982).…”

Section: Stacking Velocity Estimation By Weighted Clusteringmentioning

confidence: 99%

Automatic stacking‐velocity estimation using similarity‐weighted clustering

Liu

et al. 2017

Geophysical Prospecting

View full text Add to dashboard Cite

Local seismic event slopes contain subsurface velocity information and can be used to estimate seismic stacking velocity. In this paper, we propose a novel approach to estimate the stacking velocity automatically from seismic reflection data using similarity‐weighted k‐means clustering, in which the weights are local similarity between each trace in common midpoint gather and a reference trace. Local similarity reflects the local signal‐to‐noise ratio in common midpoint gather. We select the data points with high signal‐to‐noise ratio to be used in the velocity estimation with large weights in mapped traveltime and velocity domain by similarity‐weighted k‐means clustering with thresholding. By using weighted k‐means clustering, we make clustering centroids closer to those data points with large weights, which are more reliable and have higher signal‐to‐noise ratio. The interpolation is used to obtain the whole velocity volume after we have got velocity points calculated by weighted k‐means clustering. Using the proposed method, one obtains a more accurate estimate of the stacking velocity because the similarity‐based weighting in clustering takes into account the signal‐to‐noise ratio and reliability of different data points in mapped traveltime and velocity domain. In order to demonstrate that, we apply the proposed method to synthetic and field data examples, and the resulting images are of higher quality when compared with the ones obtained using existing methods.

show abstract

“…When there truly are K clusters, and enough effort is expended, then, in some cases, K -means will converge quickly to the right solution [13,19]. On the other hand, it is known that for ill-fated configurations K -means can take a long time to converge [18]. As such, K -means must be significantly tailored and tested for use in practical applications.…”

Section: Existing Workmentioning

confidence: 99%

“…This considerably decreases the flop count of algorithms that try to minimize the above expression, as there are many fewer terms involved if K M. There are many algorithms that directly or indirectly try to minimize the above expression over the K columns Y j . However it is difficult to the find the global minimum and the quality of the local minimum may not be good, though there does not necessarily seem to be agreement over this in the literature, as the precise local minima at which the algorithm stops depends on the starting point [13,14,18].…”

Section: Introductionmentioning

confidence: 99%

Fast indefinite multi-point (IMP) clustering

Chandrasekaran

Rajagopal

2016

Calcolo

View full text Add to dashboard Cite

A new class of objective functions and an associated fast descent algorithm that generalizes the K -means algorithm is presented. The algorithm represents clusters as unions of Voronoi cells and an explicit, but efficient, combinatorial search phase enables the algorithm to escape many local minima with guaranteed descent. The objective function has explicit penalties for gaps between clusters. Numerical experiments are provided.

show abstract

k-means requires exponentially many iterations even in the plane

Cited by 69 publications

References 11 publications

HacDivSel: Two new methods (haplotype-based and outlier-based) for the detection of divergent selection in pairs of populations

HacDivSel: Two new methods (haplotype-based and outlier-based) for the detection of divergent selection in pairs of populations

Automatic stacking‐velocity estimation using similarity‐weighted clustering

Fast indefinite multi-point (IMP) clustering

Contact Info

Product

Resources

About