An Efficient Grid-Based K-Prototypes Algorithm for Sustainable Decision-Making on Spatial Objects

Jang, Hong Jun; Kim, Byoungwook; Kim, Jongwan; Jung, Sung-No

doi:10.3390/su10082614

Cited by 10 publications

(7 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The weight age Hamming dissimilarity metric introduced is used, while this dissimilarity metric considers both the relative frequency and the distribution of each mode category. Jang et al [27] present a grid-based k-prototypes algorithm, namely GK-prototypes, that enhances the basic algorithm's performance. As far as categorical attributes are concerned, the algorithm takes into account the maximum distance between a cluster center and a cell, while as far as numeric attributes are concerned, the algorithm takes into account the maximum and minimum distances.…”

Section: Related Workmentioning

confidence: 99%

kClusterHub: An AutoML-Driven Tool for Effortless Partition-Based Clustering over Varied Data Types

Gratsos,

Ougiaroglou,

Margaris

2023

Future Internet

View full text Add to dashboard Cite

Partition-based clustering is widely applied over diverse domains. Researchers and practitioners from various scientific disciplines engage with partition-based algorithms relying on specialized software or programming libraries. Addressing the need to bridge the knowledge gap associated with these tools, this paper introduces kClusterHub, an AutoML-driven web tool that simplifies the execution of partition-based clustering over numerical, categorical and mixed data types, while facilitating the identification of the optimal number of clusters, using the elbow method. Through automatic feature analysis, kClusterHub selects the most appropriate algorithm from the trio of k-means, k-modes, and k-prototypes. By empowering users to seamlessly upload datasets and select features, kClusterHub selects the algorithm, provides the elbow graph, recommends the optimal number of clusters, executes clustering, and presents the cluster assignment, through tabular representations and exploratory plots. Therefore, kClusterHub reduces the need for specialized software and programming skills, making clustering more accessible to non-experts. For further enhancing its utility, kClusterHub integrates a REST API to support the programmatic execution of cluster analysis. The paper concludes with an evaluation of kClusterHub’s usability via the System Usability Scale and CPU performance experiments. The results emerge that kClusterHub is a streamlined, efficient and user-friendly AutoML-inspired tool for cluster analysis.

show abstract

Section: Related Workmentioning

confidence: 99%

kClusterHub: An AutoML-Driven Tool for Effortless Partition-Based Clustering over Varied Data Types

Gratsos,

Ougiaroglou,

Margaris

2023

Future Internet

View full text Add to dashboard Cite

show abstract

“…Kacem et al [26] propose parallelization of the K-prototypes clustering method [9] to handle large mixed datasets, this algorithm uses the MapReduce framework [108] for parallelization. Jang et al [27] use a gridbased indexing technique to develop grid-based K-prototypes algorithm that speeds up K-prototypes algorithm. The experiments carried out using a spatial dataset consisting of numeric and categorical features show that the proposed method takes less time than the original K-prototypes algorithm.…”

Section: A Partitional Clusteringmentioning

confidence: 99%

Survey of State-of-the-Art Mixed Data Clustering Algorithms

2019

View full text Add to dashboard Cite

Mixed data comprises both numeric and categorical features, and mixed datasets occur frequently in many domains, such as health, finance, and marketing. Clustering is often applied to mixed datasets to find structures and to group similar objects for further analysis. However, clustering mixed data are challenging because it is difficult to directly apply mathematical operations, such as summation or averaging, to the feature values of these datasets. In this paper, we present a taxonomy for the study of mixed data clustering algorithms by identifying five major research themes. We then present the state-of-the-art review of the research works within each research theme. We analyze the strengths and weaknesses of these methods with pointers for future research directions. At last, we present an in-depth analysis of the overall challenges in this field, highlight open research questions, and discuss guidelines to make progress in the field. INDEX TERMS Categorical features, clustering, mixed datasets, numeric features. I. INTRODUCTION Clustering is an unsupervised machine learning technique used to group unlabeled data into clusters that contain data points that are 'similar' to each other and 'dissimilar' from those in other clusters [1], [2]. Many clustering algorithms can only handle data that contain either numeric or categorical feature values [3], [4]. Numeric features can take real values, such as height, weight, and distance. Categorical features represent data that can be divided into a fixed number of categories, such as color, race, sex, profession, and blood group. Clustering algorithms group data points into clusters using some notion of 'similarity', which can be as simple as the Euclidean distance. To compute the similarity between numeric feature values, mathematical operations (such as distances, angles, summation, or mean) are applied to them. Distance-based similarity measures are mostly used for numeric data points. Generally, categorical feature values are not inherently ordered (for example, the categorical values, red and blue). It is not possible to directly compute the distance between two categorical feature values. Therefore, computing distance-based similarity measures for categorical data is a challenging task [5]. Nevertheless, several methods The associate editor coordinating the review of this manuscript and approving it for publication was Haruna Chiroma.

show abstract

“…Kacem et al [26] propose parallelization of the K-prototypes clustering method [9] to handle large mixed datasets, this algorithm uses the MapReduce framework [108] for parallelization. Jang et al [27] use a grid-based indexing technique to develop grid-based Kprototypes algorithm that speeds up K-prototypes algorithm. The experiments carried out using a spatial dataset consisting of numeric and categorical features show that the proposed method takes less time than the original K-prototypes algorithm.…”

Section: A Partitional Clusteringmentioning

confidence: 99%

Survey of state-of-the-art mixed data clustering algorithms

Ahmad,

Khan

2018

Preprint

View full text Add to dashboard Cite

show abstract

An Efficient Grid-Based K-Prototypes Algorithm for Sustainable Decision-Making on Spatial Objects

Cited by 10 publications

References 43 publications

kClusterHub: An AutoML-Driven Tool for Effortless Partition-Based Clustering over Varied Data Types

kClusterHub: An AutoML-Driven Tool for Effortless Partition-Based Clustering over Varied Data Types

Survey of State-of-the-Art Mixed Data Clustering Algorithms

Survey of state-of-the-art mixed data clustering algorithms

Contact Info

Product

Resources

About