Improvement of K-means clustering algorithm with better initial centroids based on weighted average

Mahmud, Md. Sohrab; Rahman, Md. Mostafizer; Akhtar, Md. Nasim

doi:10.1109/icece.2012.6471633

Cited by 44 publications

(20 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The set of elements between classified clusters is disjointed, and the number of elements in each cluster C i is denoted by n i . The k-means algorithm consists of two steps [29]. First, the initial centroids for each cluster are chosen randomly, then each point in the dataset is assigned to its nearest centroid by Euclidean distance [30].…”

Section: Data Mining Techniques For Ppre Analysismentioning

confidence: 99%

Vision-Based Potential Pedestrian Risk Analysis on Unsignalized Crosswalk Using Data Mining Techniques

Noh

Lee

et al. 2020

Applied Sciences

View full text Add to dashboard Cite

Though the technological advancement of smart city infrastructure has significantly improved urban pedestrians’ health and safety, there remains a large number of road traffic accident victims, making it a pressing current transportation concern. In particular, unsignalized crosswalks present a major threat to pedestrians, but we lack dense behavioral data to understand the risks they face. In this study, we propose a new model for potential pedestrian risky event (PPRE) analysis, using video footage gathered by road security cameras already installed at such crossings. Our system automatically detects vehicles and pedestrians, calculates trajectories, and extracts frame-level behavioral features. We use k-means clustering and decision tree algorithms to classify these events into six clusters, then visualize and interpret these clusters to show how they may or may not contribute to pedestrian risk at these crosswalks. We confirmed the feasibility of the model by applying it to video footage from unsignalized crosswalks in Osan city, South Korea.

show abstract

Section: Data Mining Techniques For Ppre Analysismentioning

confidence: 99%

Vision-Based Potential Pedestrian Risk Analysis on Unsignalized Crosswalk Using Data Mining Techniques

Noh

Lee

et al. 2020

Applied Sciences

View full text Add to dashboard Cite

show abstract

“…The sorted list of data points are then divided into k subsets. The nearest possible value of mean from each dataset becomes the initial centroids of the cluster to be constructed [13]. The Pesudocode of load based initial centriod k-means algorithm is as follows: Input: D = d1, d2.......dn // set of n data items L // set of load for data points.…”

Section: Load Based Initial Centroid K-means Algorithmmentioning

confidence: 99%

KSOMKM: An Efficient Approach for High Dimensional Dataset Clustering

Begum¹,

Akthar²

2013

IJOEE

View full text Add to dashboard Cite

The process which was used for grouping the similar elements or occurring closely is called cluster. Nowadays cluster analysis is one of the major data analysis techniques. On the other hand many important problems involve clustering for large datasets. KSOM and k-means is one of the most popular partitioning clustering algorithms that are widely used. The original k-means algorithm is computationally expensive and the number of clusters K, to be specified before the algorithm is applied. The other thing is, it is quite sensitive to initial centroids. When more number of dimensions is added then K-Means fails to give optimum result. For this "Curse of High Dimensionality" problem is occurred. Here we propose that Kohonen Self Organizing Map (KSOM) is used to define number of clusters and then load based initial centroid K-Means algorithm (KSOMKM) is used to find out the more accurate number of cluster for High Dimensional Dataset. Finally the Kohonen Self Organizing Map (KSOM) with Load based K-Means algorithm (KSOMKM) is tested on different datasets. There are an IRIS data set, Diabetes dataset, Thyroid, Blood pressure dataset. Its performance is compared with other clustering algorithm for number of iteration, quantization errors and topographic errors.  Index Terms-curse of dimensionality, data mining, high-dimensional datasets and Kohonen Self Organizing Map (KSOM).

show abstract

“…Also it provides improvement on the classical k-means algorithm to produce more accurate clusters. The three initialization methods explored here are K-means with weighted average method [4], Principal component analysis [5][6] and a heuristic method [7]. The novelty in the presented work comes from the involvement of distributed implementation of initialization methods using MapReduce paradigm on a totally diverse collection of data sets.…”

Section: Related Workmentioning

confidence: 99%

“…In [4] Mahmud M S et al employed a uniform method to find rank score by averaging the attribute of each data point, which generated initial centroids that follow the data distribution of the given set. A sorting algorithm is applied to the computed score and divided into "k" subsets, where k is the number of desired clusters.…”

Section: Related Workmentioning

confidence: 99%

“…The three initialization methods explored are K-means with weighted average method [4], Principal component analysis [5][6][7] and a heuristic method [8] based on sorting and partitioning of the input data for finding better initial centroids. Experimental results show that the proposed algorithms produce better clusters in less computational time by parallelizing the tasks using Hadoop cluster setup.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A Comprehensive Survey on Centroid Selection Strategies for Distributed K-means Clustering Algorithm

Ghuli¹,

Prabhakar²,

Shettar³

2015

IJCA

View full text Add to dashboard Cite

Extremely large data sets often known as "Big Data" are analyzed for interesting patterns, trends, and associations, especially those relating to human behavior and interactions. Extraction of meaningful and useful information needs to be done in parallel using advanced clustering algorithms. In this paper, effort has been made to tweak in changes to the existing K-means algorithm so as to work in parallel using MapReduce paradigm. K-means due to its gradient descent nature is highly sensitive to the initial placement of the cluster centers. This random initialization of cluster centers results in empty clusters and slower convergence. In this paper, an overview of existing methods with emphasis on computational efficiency is presented. Comparison of three well known linear time complexity initialization methods has been presented here. These methods are analyzed on two different data sets. The experimental results are recorded and presented with insights on different initialization methods for practitioners.

show abstract

Improvement of K-means clustering algorithm with better initial centroids based on weighted average

Cited by 44 publications

References 5 publications

Vision-Based Potential Pedestrian Risk Analysis on Unsignalized Crosswalk Using Data Mining Techniques

Vision-Based Potential Pedestrian Risk Analysis on Unsignalized Crosswalk Using Data Mining Techniques

KSOMKM: An Efficient Approach for High Dimensional Dataset Clustering

A Comprehensive Survey on Centroid Selection Strategies for Distributed K-means Clustering Algorithm

Contact Info

Product

Resources

About