Due to the lack of objective measures, the evaluation and prioritization of clustering methods is inherently challenging. Since their evaluation generally involves numerous criteria, it can be designed as a multiple criteria decision making (MCDM) problem and using multiple data sets, the problem can be formulated as a group MCDM modeling. In this paper, a MCDM-based framework is proposed to evaluate and rank a number of clustering methods. The proposed approach employs three group MCDM algorithms and a Borda count method which leads to a comprehensive, robust framework capable of evaluating and ranking multiple clustering models on manifold data sets (cases). Moreover, we introduce a hybrid data clustering algorithm which combines a particle swarm optimization (PSO) algorithm with a Kmeans clustering algorithm. Finally, a clustering comparison with regard to both external and internal evaluation indicators is another contribution of this paper. Six clustering methods are compared based on five evaluation measures. The results of comparative experiments on ten data sets indicate the effectiveness of the proposed hybrid clustering method. More importantly, the experimental results vividly demonstrate the effectiveness of the group MCDM-based evaluation on clustering model selection.
Mohamad Mohsen Sedighiis a researcher at the Young Researchers Club, South Tehran Branch, Islamic Azad University.
Taha Mokfiis a data miner and data mining trainer conducting plenty of research projects in Iranian companies.
Seyedehfatemeh Golrizgashtiis a lecturer at South Tehran Branch, Islamic Azad University.ABSTRACT Customer Relationship Management (CRM) plays a prominent role in enabling businesses to meet their customers ' needs, and therefore it acts as a catalyst in the process of creating and delivering value to them. As CRM concerns managing customer knowledge, it can be considered as a subset of Knowledge Management (KM). Therefore, in this study, the effort has been made to propose a Customer Knowledge Management (CKM) process model to compensate the existing lack of a study integrating CRM and KM with the aim of customer value augmentation. In this CKM model, all forms of CRM are employed to support all the phases of CKM. Finally, a home appliances case is studied to illustrate the proposed CKM model.
While articles assessing the accuracy of traditional statistical packages are fairly commonplace, data mining software has escaped this important scrutiny. We apply the National Institute of Standards and Technology Statistical Reference Datasets tests for the numerical accuracy of statistical packages to 7 data mining packages: IBM Modeler, KNIME, Orange, Python, RapidMiner, Weka, and XLMiner. We find that one package has an unstable algorithm for the calculation of the sample variance and only two have reliable linear regression routines. Of these two packages that offer analysis of variance, one has a bad algorithm. The accuracy of statistical calculations in data mining packages cannot be taken for granted.
This article is categorized under:
Algorithmic Development > Statistics
Application Areas > Data Mining Software Tools
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.