Clustering plays a major role in the data mining application, because it divides and groups the data effectively. In the pattern analysis, two major challenges occur in real-life applications that includes handling the categorical data and the availability of correctly labeled data. According to the characteristics of homogeneity, the clustering techniques are designed to group the unlabeled data. Some important issues such as high memory utilization, time consumption, overhead, computation complexity and less effective results are present in various existing algorithms of numerical data. Therefore, the research study implemented clustering techniques based on the similarity of categorical data. Simultaneously, the attributes of inter and intra-clusters' similarities are identified, and then the performance of proposed method is improved by integrating those similarities. The noises are also removed by performing the pre-processing techniques, so the similarity between noise-free elements are estimated. Once these similarities are identified, the insignificant attributes are removed and the relevant attributes are chosen from the preprocessed elements. The overhead is reduced by developing the Similarity-based K-means Clustering (SKC) approach for clustering the attributes that depends on divergence distance. The efficiency of SKC is tested in the experimental analysis by means of precision, f-measure, accuracy, error rate of clustering and recall. The results state that the developed study achieved 98.45% accuracy for the publicly available dataset when comparing with the existing techniques: variations of Particle Swarm Optimization (PSO) and semi-supervised clustering system.