Combining K-Means and a genetic algorithm through a novel arrangement of genetic operators for high quality clustering

Islam, Zahidul; Estivill‐Castro, Vladimir; Rahman, Anisur; Bossomaier, Terry

doi:10.1016/j.eswa.2017.09.005

Cited by 73 publications

(55 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In particular, to analyze and compare the effectiveness and the performance of the initial population of SeedClust, two other initial population methods are used for the experiments: the initial population method of GenClust, and the density-based and K-means++ (distance-based) initialization population method (named as SeedClust (Distance), its chromosome representation and operations are the same as SeedClust, and K-means++ [14,15,55] is used for the initial seeds). In our experiment, so far, we used the improved K-means++ (density-based) for the seed selection of the initial population, and we could see that SeedClust clearly outperformed the other two existing techniques that were used in this study, as shown in Figure 2.…”

Section: Experiments Resultsmentioning

confidence: 99%

Genetic Algorithm with an Improved Initial Population Technique for Automatic Clustering of Low-Dimensional Data

Zhou

Miao

2018

Information

View full text Add to dashboard Cite

K-means clustering is an important and popular technique in data mining. Unfortunately, for any given dataset (not knowledge-base), it is very difficult for a user to estimate the proper number of clusters in advance, and it also has the tendency of trapping in local optimum when the initial seeds are randomly chosen. The genetic algorithms (GAs) are usually used to determine the number of clusters automatically and to capture an optimal solution as the initial seeds of K-means clustering or K-means clustering results. However, they typically choose the genes of chromosomes randomly, which results in poor clustering results, whereas a generally selected initial population can improve the final clustering results. Hence, some GA-based techniques carefully select a high-quality initial population with a high complexity. This paper proposed an adaptive GA (AGA) with an improved initial population for K-means clustering (SeedClust). In SeedClust, which is an improved density estimation method and the improved K-means++ are presented to capture higher quality initial seeds and generate the initial population with low complexity, and the adaptive crossover and mutation probability is designed and is then used for premature convergence and to maintain the population diversity, respectively, which can automatically determine the proper number of clusters and capture an improved initial solution. Finally, the best chromosomes (centers) are obtained and are then fed into the K-means as initial seeds to generate even higher quality clustering results by allowing the initial seeds to readjust as needed. Experimental results based on low-dimensional taxi GPS (Global Position System) data sets demonstrate that SeedClust has a higher performance and effectiveness.

show abstract

Section: Experiments Resultsmentioning

confidence: 99%

Genetic Algorithm with an Improved Initial Population Technique for Automatic Clustering of Low-Dimensional Data

Zhou

Miao

2018

Information

View full text Add to dashboard Cite

show abstract

“…where P m > 0 is the mutation probability, P(L j (t)) > 0 is the selection probability defined by Equation (18), and ∑ N j=1 P(L j (t)) = 1.…”

Section: Convergencementioning

confidence: 99%

“…A disadvantage of K-Means is that it is easy to fall into local optima. As a remedy, a popular trend is to integrate the genetic algorithm [7,8] with K-means to obtain genetic K-means algorithms [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23]. K-Means is also combined with fuzzy mechanism to obtain fuzzy C-Means [24,25].…”

Section: Introductionmentioning

confidence: 99%

A Genetic XK-Means Algorithm with Empty Cluster Reassignment

Hua

Zhang

et al. 2019

Symmetry

View full text Add to dashboard Cite

K-Means is a well known and widely used classical clustering algorithm. It is easy to fall into local optimum and it is sensitive to the initial choice of cluster centers. XK-Means (eXploratory K-Means) has been introduced in the literature by adding an exploratory disturbance onto the vector of cluster centers, so as to jump out of the local optimum and reduce the sensitivity to the initial centers. However, empty clusters may appear during the iteration of XK-Means, causing damage to the efficiency of the algorithm. The aim of this paper is to introduce an empty-cluster-reassignment technique and use it to modify XK-Means, resulting in an EXK-Means clustering algorithm. Furthermore, we combine the EXK-Means with genetic mechanism to form a genetic XK-Means algorithm with empty-cluster-reassignment, referred to as GEXK-Means clustering algorithm. The convergence of GEXK-Means to the global optimum is theoretically proved. Numerical experiments on a few real world clustering problems are carried out, showing the advantage of EXK-Means over XK-Means, and the advantage of GEXK-Means over EXK-Means, XK-Means, K-Means and GXK-Means (genetic XK-Means).

show abstract

“…In [38], an algorithm of artificial bee colony was executed to mimic an intelligent foraging conduct of honey bee swarms. In [40], a k-means was optimized through a GA, which esteems the impact of isolated points. Several studies also suggested a number of approaches for NN optimization by GA [41][42][43][44].…”

Section: Literature Review On Multi-modal Emotion Recognitionmentioning

confidence: 99%

Multi-Modal Emotion Aware System Based on Fusion of Speech and Brain Information

2019

View full text Add to dashboard Cite

In multi-modal emotion aware frameworks, it is essential to estimate the emotional features then fuse them to different degrees. This basically follows either a feature-level or decision-level strategy. In all likelihood, while features from several modalities may enhance the classification performance, they might exhibit high dimensionality and make the learning process complex for the most used machine learning algorithms. To overcome issues of feature extraction and multi-modal fusion, hybrid fuzzy-evolutionary computation methodologies are employed to demonstrate ultra-strong capability of learning features and dimensionality reduction. This paper proposes a novel multi-modal emotion aware system by fusing speech with EEG modalities. Firstly, a mixing feature set of speaker-dependent and independent characteristics is estimated from speech signal. Further, EEG is utilized as inner channel complementing speech for more authoritative recognition, by extracting multiple features belonging to time, frequency, and time–frequency. For classifying unimodal data of either speech or EEG, a hybrid fuzzy c-means-genetic algorithm-neural network model is proposed, where its fitness function finds the optimal fuzzy cluster number reducing the classification error. To fuse speech with EEG information, a separate classifier is used for each modality, then output is computed by integrating their posterior probabilities. Results show the superiority of the proposed model, where the overall performance in terms of accuracy average rates is 98.06%, and 97.28%, and 98.53% for EEG, speech, and multi-modal recognition, respectively. The proposed model is also applied to two public databases for speech and EEG, namely: SAVEE and MAHNOB, which achieve accuracies of 98.21% and 98.26%, respectively.

show abstract

Combining K-Means and a genetic algorithm through a novel arrangement of genetic operators for high quality clustering

Cited by 73 publications

References 26 publications

Genetic Algorithm with an Improved Initial Population Technique for Automatic Clustering of Low-Dimensional Data

Genetic Algorithm with an Improved Initial Population Technique for Automatic Clustering of Low-Dimensional Data

A Genetic XK-Means Algorithm with Empty Cluster Reassignment

Multi-Modal Emotion Aware System Based on Fusion of Speech and Brain Information

Contact Info

Product

Resources

About