Iterative clustering of high dimensional text data augmented by local search

Dhillon, Inderjit S.; Guan, Yuqiang; Kogan, Jacob

doi:10.1109/icdm.2002.1183895

Cited by 78 publications

(64 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Fortunately, Gmeans was recently compared to CLUTO in the benchmarking study of Tunali,Çamurcu, and Bilgin (2010) who provide modified source code via http://www.dataminingresearch.com/index.php/2010/ 06/gmeans-clustering-software-compatible-with-gcc-4/ which can be compiled using current versions of GCC. Gmeans uses the fixed-point algorithm combined with the first variation local improvement strategy of Dhillon et al (2002) (as also available for method "pclust") and provides a choice among six different initialization methods. By default, no first variations are performed, and the initial prototypes are chosen by first determining the spherical 1-means prototype i x i / x i , and then repeatedly picking the x i most dissimilar to the already chosen prototypes as the next prototype (i.e., by default prototypes are initialized in a deterministic way).…”

Section: Methods "Gmeans"mentioning

confidence: 99%

“…In general, the corresponding computations may be prohibitively expensive. For the standard spherical k-means problem, Dhillon, Guan, and Kogan (2002) note that the effect of moving the i-th object from its cluster j = c(i) to a different cluster l can be obtained as follows. Let Ψ(M ) = min P Φ(M, P ) be the minimal criterion value for fixed M .…”

Section: The Standard Spherical K-means Problemmentioning

confidence: 99%

“…Thus, optimal single object moves (so-called first variation moves) can be computed in O(n+k) steps provided that all object-prototype dissimilarities are available (these need to be computed for a fixed-point iteration anyway). In particular, one can attempt to improve the fixed-point algorithm via performing Kernighan-Lin style fixed-length chains of optimal single object moves to possibly improve the criterion "sufficiently enough" once the fixed-point iterations failed to do so, and in case of success resume the latter (Dhillon et al 2002).…”

Section: The Standard Spherical K-means Problemmentioning

confidence: 99%

“…If m = 1 (but the weights are not necessarily all one), the first term is i,j w i µ ij = i w i and hence constant, and the results of Dhillon et al (2002) for optimal single object moves can straightforwardly be generalized, withx i = w i x i / x i and new squared norms given by…”

Section: The Extended Spherical K-means Problemmentioning

confidence: 99%

“…For the hard case (m = 1), one can optionally attempt further local improvements via Kernighan-Lin chains of first variation single object moves as suggested by Dhillon et al (2002). The chain length employed is specified by control argument maxchains.…”

Section: Algorithmsmentioning

confidence: 99%

See 4 more Smart Citations

Sphericalk-Means Clustering

Hornik¹,

Feinerer²,

Kober³

et al. 2012

J. Stat. Soft.

173

View full text Add to dashboard Cite

Clustering text documents is a fundamental task in modern data analysis, requiring approaches which perform well both in terms of solution quality and computational efficiency. Spherical k-means clustering is one approach to address both issues, employing cosine dissimilarities to perform prototype-based partitioning of term weight representations of the documents. This paper presents the theory underlying the standard spherical k-means problem and suitable extensions, and introduces the R extension package skmeans which provides a computational environment for spherical k-means clustering featuring several solvers: a fixed-point and genetic algorithm, and interfaces to two external solvers (CLUTO and Gmeans). Performance of these solvers is investigated by means of a large scale benchmark experiment.

show abstract

Section: Methods "Gmeans"mentioning

confidence: 99%

Section: The Standard Spherical K-means Problemmentioning

confidence: 99%

Section: The Standard Spherical K-means Problemmentioning

confidence: 99%

Section: The Extended Spherical K-means Problemmentioning

confidence: 99%

Section: Algorithmsmentioning

confidence: 99%

See 3 more Smart Citations

Sphericalk-Means Clustering

Hornik¹,

Feinerer²,

Kober³

et al. 2012

J. Stat. Soft.

173

View full text Add to dashboard Cite

show abstract

Memristive Cosine‐Similarity‐Based Few‐Shot Learning with Lifelong Memory Adaptation

Zhou

Ling

Bao

et al. 2023

Advanced Intelligent Systems

View full text Add to dashboard Cite

Deep learning in recent years has drawn much attention for its outperformance in the fields of computer vision, [1] speech recognition, [2] and natural language processing. [3] The implementation of these tasks relies heavily on the use of deep neural networks (DNNs) to model the prior knowledge distribution in massive amounts of data. In most cases, acquiring training data for DNNs is much more important than the DNN models. [4] Nevertheless, processing data with a limited size of training materials is more in line with the human brain, such as a kid can easily distinguish a dog from cats once he is taught what a dog looks like with one sample. But this problem faces great challenges for the mainstream neural networks because DNNs fail to learn the features of a dog in only one picture. Few-shot learning (FSL), a branch of meta-learning, [5] has been accordingly proposed to build reliable machine learning models using a few labeled examples, just like the way humans work. Among the popular FSL algorithms, the use of dynamic external memory combined with neural network control, [6] also known as memory-augmented neural network (MANN), [7] stands out for the outperformance on FSL tasks. The external memory, usually DRAM, [7,8] is used to remember the features of the insufficient samples for a long time and assists in improving the performance of FSL consistently. However, the time delay caused by the data communication between the memory and the running neural network is costly and degrades the performance of data search using external memory. The introduction of emerging non-volatile memory has become a viable solution to this problem due to its non-volatile nature, which greatly improves search speed and shows the potential for lifelong learning tasks.Various non-volatile devices, such as the flash, [9] FeFETs, [10,11] resistive random-access memory (RRAM), [12][13][14] and phase-change memory (PCM), [15] have been used as external memory to accelerate the similarity computation in MANN. Ni et al. [11] proposed a two-FeFET scheme as a ternary content addressable memory (ternary CAM) cell to calculate the Hamming distances of binary vectors for data indexing. However, two problems arising from this framework are the reduced accuracy compared to digital computers and the long encoding bits for the binarized vectors. Then Karunaratne et al. [15] designed a new attention mechanism to improve the accuracy of similarity-based FSL with the cosine distance of binarized vectors on PCMs. The length of the embedding vectors in refs. [11,15] are 512 bits with the Omniglot dataset, leading to a large area and power overhead of external memory. Kazemi et al. [10] developed an analog CAM with FeFET using the structure in ref. [11] and tried to reduce the encoding

show abstract

Grouping Multidimensional Data

Kogan¹,

Nicholas²,

Teboulle³

2006

View full text Add to dashboard Cite

Iterative clustering of high dimensional text data augmented by local search

Cited by 78 publications

References 7 publications

Sphericalk-Means Clustering

Sphericalk-Means Clustering

Memristive Cosine‐Similarity‐Based Few‐Shot Learning with Lifelong Memory Adaptation

Grouping Multidimensional Data

Contact Info

Product

Resources

About