The economic datasets have their specifics; they usually describe human behavior or activity, which are difficult to measure. Thus, in comparison to non-economic datasets, they are less consistent. The paper analyzes differences between categorical economic and non-economic datasets in hierarchical clustering (HCA). To achieve this goal, two analyses based on 25 realworld datasets are carried out. In the first one, groups of economic and non-economic datasets are compared from the point of view of their internal characteristics based on HCA results; in the second one, homogenous groups of datasets are recognized and they are further examined by internal characteristics and graphical outputs. For each group of datasets, the most appropriate similarity measures are identified. The results show substantial differences between economic and non-economic datasets, primarily in terms of the within-cluster variability decrease. We were also successful in classification of the examined datasets into easily interpretable groups, for which suitable similarity measures were identified.
This paper thoroughly examines three recently introduced modifications of the Gower coefficient, which were determined for data with mixed-type variables in hierarchical clustering. On the contrary to the original Gower coefficient, which only recognizes if two categories match or not in the case of nominal variables, the examined modifications offer three different approaches to measuring the similarity between categories. The examined dissimilarity measures are compared and evaluated regarding the quality of their clusters measured by three internal indices (Dunn, silhouette, McClain) and regarding their classification abilities measured by the Rand index. The comparison is performed on 810 generated datasets. In the analysis, the performance of the similarity measures is evaluated by different data characteristics (the number of variables, the number of categories, the distance of clusters, etc.) and by different hierarchical clustering methods (average, complete, McQuitty and single linkage methods). As a result, two modifications are recommended for the use in practice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.