With the advent of the k-modes algorithm, the toolbox for clustering categorical data has an efficient tool that scales linearly in the number of data items. However, random initialization of cluster centers in k-modes makes it hard to reach a good clustering without resorting to many trials. Recently proposed methods for better initialization are deterministic and reduce the clustering cost considerably. A variety of initialization methods differ in how the heuristics chooses the set of initial centers. In this paper, we address the clustering problem for categorical data from the perspective of community detection. Instead of initializing k modes and running several iterations, our scheme, CD-Clustering, builds an unweighted graph and detects highly cohesive groups of nodes using a fast community detection technique. The top-k detected communities by size will define the k modes. Evaluation on ten real categorical datasets shows that our method outperforms the existing initialization methods for k-modes in terms of accuracy, precision, and recall in most of the cases.
Trademark registration offices or authorities have been bombarded with requests from enterprises. These authorities face a great deal of difficulties in protecting enterprises' rights such as copyright, license, or uniqueness of logo or trademark since they have only conventional clustering. Urgent and essential need for sufficient automatic trademark image retrieval system, therefore, is entirely worth thorough research. In this paper, we propose a novel trademark image retrieval method in which the input trademark image is first separated into dominant visual shape images then a feature vector for each shape image which is scale-, rotation-, and translation-invariant is created. Finally, a similarity measure between two trademarks is calculated based on these feature vectors. Given a query trademark image, retrieval procedure is carried out by taking the most five similar trademark images in a predefined trademark. Various experiments are conducted to mimic the many types of trademark copying.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.