Technological advancement has enabled us to store and process huge amount of data in relatively short spans of time. The nature of data is rapidly changing, particularly its dimensionality is more commonly multi- and high-dimensional. There is an immediate need to expand our focus to include analysis of high-dimensional and large datasets. Data analysis is becoming a mammoth task, due to incremental increase in data volume and complexity in terms of heterogony of data. It is due to this dynamic computing environment that the existing techniques either need to be modified or discarded to handle new data in multiple high-dimensions. Data clustering is a tool that is used in many disciplines, including data mining, so that meaningful knowledge can be extracted from seemingly unstructured data. The aim of this article is to understand the problem of clustering and various approaches addressing this problem. This article discusses the process of clustering from both microviews (data treating) and macroviews (overall clustering process). Different distance and similarity measures, which form the cornerstone of effective data clustering, are also identified. Further, an in-depth analysis of different clustering approaches focused on data mining, dealing with large-scale datasets is given. These approaches are comprehensively compared to bring out a clear differentiation among them. This article also surveys the problem of high-dimensional data and the existing approaches, that makes it more relevant. It also explores the latest trends in cluster analysis, and the real-life applications of this concept. This survey is exhaustive as it tries to cover all the aspects of clustering in the field of data mining.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.