Clustering is one of the major issues in data mining. Data labeling has been recognized as an important method in categorical clustering. Clustering is technique where all similar data point are grouped. However, with data labeling is applied on those points which are not labeled earlier. Although there are many approaches in the numerical domain, but very limited algorithms are available for categorical data. To address this problem of how to allocate those unlabeled data points into proper clusters remains as a challenging issue in the categorical domain. In this paper, a mechanism is proposed for labeling and keeping the similar data points into accurate clusters. We have a data set named Genome DNA where grouping of 'superfluous' Splice junctions on those points on a DNA sequence is a major challenge. The predicament posed in this dataset is to recognize, given a sequence of DNA, the limits between exons and introns. The new proposal is to allocate each unlabeled data point into the equivalent proper cluster with data labeling also. This method has two advantages: 1) The proposed method exhibits high execution efficiency. 2) This method can achieve quality clusters. The proposed method is empirically validated on DNA data set, and it is shown significantly more efficient than prior works while attaining results of high quality. Keywords-Clustering; Categorical Data; Clustering; Data Labeling; Outlier; Entropy; Rough set;.I. INTRODUCTION In Data Mining [2] clustering is a major challenge. It is used to group similar objects as one [1,3]. These kinds of groups are often known as clusters. The extent of grouping mechanisms have been complete in Information Retrieval Systems, Medical diagnosis, statistics, and pattern recognition and machine learning, etc. The complete extent on clustering procedure can be originating in [3] various types. Numeric, Mixed and categorical data are the different types in data set. For Numeric data greater type of procedures are available when compared to other two [5][6] data types. In categorical data clustering is a complicated task, where the distance between data points is not accurate, when the data is increased on time. Clustering an enormous data set is a difficult concern in its intricacy it poses and the time it takes for the process. [7,8] In clustering sampling is another method used to pick up the capability of clustering by selecting some data points arbitrarily for early clustering and regard as the data points which are un labeled (that are not sampled and are not clustered) to opt for customs and means to allot them into suitable clusters. This is called cluster labeling [9, 10, and 11].In categorical field numerical field is not that much straight forward in finding the class field. In Data Mining, concept Drift is time overwhelming. [12,16]. The time budding data in the numerical field for clustering [1,5,6,10] has been explored in the last study literature, however not much more was addressed in categorical domain. So, still it is a main trouble in the ...