Missing values can significantly reduce the accuracy and availability of business data. Usually, when clustering incomplete data, the data with missing values are deleted, and only the complete data are analyzed. However, this often leads to significant loss or deviation of information. This paper mainly studies how to use unsupervised machine learning techniques to deal with missing values. The combination of imputation method and clustering technology forms a new method to deal with missing values, which is helpful to overcome the problem of missing data. We propose a strategy based on the combination of K-means, big data K-means, p-k-means, and mean imputation method, singular value decomposition imputation method, k-nearest neighbor imputation method. By comparing the performance of nine methods in different business data sets. The experimental analysis was carried out on four benchmark data sets. The effectiveness of K-means clustering and imputation methods is verified on different data sets, and the results also have a certain application prospect.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.