Data clustering is an important data exploration technique with many applications in data mining. Kmeans is one of the most well known methods of data mining that partitions a dataset into groups of patterns, many methods have been proposed to improve the performance of the K-means algorithm. Standardization is the central preprocessing step in data mining, to standardize values of features or attributes from different dynamic range into a specific range. In this paper, we have analyzed the performances of the three standardization methods on conventional K-means algorithm. By comparing the results on infectious diseases datasets, it was found that the result obtained by the z-score standardization method is more effective and efficient than min-max and decimal scaling standardization methods.
Hypertension is a worldwide public health challenge. The study investigated the time it takes to attain an optimal control of hypertension and the major factors that influence the control in Specialist Hospital, Sokoto. A retrospective cohort study was conducted involving 300 patient records. The population consisted all hypertensive patients on follow-ups at Specialist Hospital Sokoto from1st February, 2015 to 1st February, 2021.Statistical Package for the Social Sciences version 20 and R software were used for descriptive, Kaplan-Meier estimator, Cox Proportional Regression (CPH) Model and Weibull Regression Model analyses. Hypertensive patients attain an optimal control after a median survival time of 40.43 (at 95% CI: 33.67- 47.19) months (3.37 years) and mean survival time of 44.18 (CI: 37.24-51.12) months (3.68 years). The CPH analysis revealed that the factors that influenced an optimal control of hypertension were body mass index (BMI) (P <0.001), number of anti-hypertensive drugs (P <0.001), place of residence (P = 0.030). similarly, the Weibull model revealed that the factors that affected an optimal control of hypertension were BMI (P <0.01), number of anti-hypertensive drugs (P <0.001), place of residence (P = 0.042) and educational status (P = 0.036). In conclusion, BMI, number of anti-hypertensive drugs, Place of residence, Educational status. should be watched out during management of hypertensive patients. This also call for an extension of this study through a prospective design to be able to measure the effect of other factors in the achievement of optimal control of hypertension
This paper analyzed the performance of the basic K-Means clustering algorithm with two major data pre-processing techniques and superlative similarity measure with automatic initialization of seed values on the dataset. Further experiment was conducted with simulated data sets to prove the accuracy of the new method. The new method presented in this paper gave a good and promising performance for the different types of data sets. The sum of the squares clustering errors reduced significantly for the new method as compared with basic K-Means method whereas inter-distances between clusters are preserved to be as large as possible for better clusters identification.
Clustering technique is used to put similar data items in a same group. K-mean clustering is a commonly used approach in clustering technique which is based on initial centroids selected randomly. However, the existing method does not consider the data preprocessing which is an important task before executing the clustering among the different database. This study proposes a new approach of k-mean clustering algorithm. Experimental analysis shows that the proposed method performs well on infectious disease data set when compare with the conventional kmeans clustering method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.