Abstract-Clustering technique is critically important step in data mining process. It is a multivariate procedure quite suitable for segmentation applications in the market forecasting and planning research. This research paper is a comprehensive report of k-means clustering technique and SPSS Tool to develop a real time and online system for a particular super market to predict sales in various annual seasonal cycles. The model developed was an intelligent tool which received inputs directly from sales data records and automatically updated segmentation statistics at the end of day's business. The model was successfully implemented and tested over a period of three months. A total of n = 2138, customer, were tested for observations which were then divided into k = 4 similar groups. The classification was based on nearest mean. An ANOVA analysis was also carried out to test the stability of the clusters. The actual day to day sales statistics were compared with predicted statistics by the model. Results were quite encouraging and had shown high accuracy.Index Terms-Cluster analysis, data mining, customer segmentation, ANOVA analysis.
I. INTRODUCTIONHighlight Clustering is a statistical technique much similar to classification. It sorts raw data into meaningful clusters and groups of relatively homogeneous observations. The objects of a particular cluster have similar characteristics and properties but differ with those of other clusters. The grouping is accomplished by finding similarities among data according to characteristics found in raw data [1]. The main objective was to find optimum number of clusters. There are two basic types of clustering methods, hierarchical and non-hierarchical. Clustering process is not one time task but is continuous and an iterative process of knowledge discovery from huge quantities of raw and unorganized data [2]. For a particular classification problem, an appropriate clustering algorithm and parameters must be selected for obtaining optimum results. [3]. Clustering is a type of explorative data mining used in many application oriented areas such as machine learning, classification and pattern recognition [4]. In recent times, data mining is gaining much faster momentum for knowledge based services such as distributed and grid computing. Cloud computing is yet another example of Manuscript received December 25, 2012; revised February 28, 2013. Kishana R. Kashwan is with the Department of Electronics and Communication Engineering-PG, Sona College of Technology (An Autonomous Institution Affiliated to Anna University), TPT Road, Tamil Nadu, drkrkashwan@sonatech.ac.in).C. M. Velu is with the Department of CSE, Dattakala Group of Institutions, Swami Chincholi, Daund, Pune-413130, India (e-mail: cmvelu41@gmail.com).frontier research topic in computer science and engineering.For clustering method, the most important property is that a tuple of particular cluster is more likely to be similar to the other tuples within the same cluster than the tuples of other clusters. For classific...