Different from numerical attributes, measuring the similarity between categorical attributes is more complex due to their non-inherently ordered characteristic, especially in an unsupervised scheme. This work, therefore, presents a new method, Heterogeneous Graph-based Similarity measure (HGS), to measure the similarity between categorical data for unsupervised learning. In order to capture the possible complex relationships hidden among attributes, a heterogeneous weighted graph is creatively constructed by extracting the information from categorical data. Both objects and attribute values are represented as nodes and their occurrence and co-occurrence relationships are shown as edges. Based on a derived node-pair graph, three rules are used to iteratively update the similarity scores between object pairs and attribute-value pairs until the scores converge. We also analyze its complexities and validate the metric properties and convergence. In experiment validation, five state-of-the-art measures are compared with HGS based on 20 UCI datasets and 6 high-dimensional datasets in the medical domain in both k-modes and spectral clustering and similarity search experiments. The results show although no measure can outperform all other measures on all datasets, HGS can perform better in both clustering and similarity search tasks on the whole. Finally, six studies further discuss the convergence, time cost, and parameter sensitivity of the HGS, explore its application to imbalanced class distribution, and compare it with its variants by different initialization and graph construction. INDEX TERMS Unsupervised learning, similarity measure, categorical data, heterogeneous graph-based similarity (HGS).