Machine learning integrated credibilistic semi supervised clustering for categorical data

Sarkar, Jnanendra Prasad; Saha, Indrajit; Chakraborty, Sinjan; Maulik, Ujjwal

doi:10.1016/j.asoc.2019.105871

Cited by 10 publications

(6 citation statements)

References 56 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, the validation of SKC method is analyzed against various existing techniques such as AFC-NSPSO [14], CrKMd [16] and other popular techniques like Support Vector Machine (SVM) and Naive Bayes (NB). The existing AFC-NSPSO and CrKMd conducted the experiments only on mushroom dataset.…”

Section: Performance Analysis Of Proposed Methods By Means Of Accuracymentioning

confidence: 99%

“…For connect dataset, the existing techniques namely SVM, NB, AFC-NSPSO and CrKMd achieved only 76% to 79% of accuracy, where proposed SKC achieved 81.47% of accuracy. The existing AFC-NSPSO [14] and CrKMd [16] didn't consider the removal of outliers before clustering process, where SKC removed the outliers that leads high performance on accuracy. This is due to the distance measures used in the SKC method for clustering the data.…”

Section: Performance Analysis Of Proposed Methods By Means Of Accuracymentioning

confidence: 99%

“…In this section, a discussion of various existing techniques is presented, which are used to cluster the categorical data. The advantages and its limitations of these existing techniques [14][15][16][17][18] are also illustrated.…”

Section: Literature Reviewmentioning

confidence: 99%

“…Sarkar [16] integrated the machine learning technique with semi-supervised clustering technique called credibilistic measure or CrKMd. The homogeneity was identified by using credibilistic measure and then the coincident clustering problems were avoided.…”

Section: Literature Reviewmentioning

confidence: 99%

See 3 more Smart Citations

A Similarity based K-Means Clustering Technique for Categorical Data in Data Mining Application

Kumar¹,

Kanavalli²

2021

IJIES

View full text Add to dashboard Cite

Clustering plays a major role in the data mining application, because it divides and groups the data effectively. In the pattern analysis, two major challenges occur in real-life applications that includes handling the categorical data and the availability of correctly labeled data. According to the characteristics of homogeneity, the clustering techniques are designed to group the unlabeled data. Some important issues such as high memory utilization, time consumption, overhead, computation complexity and less effective results are present in various existing algorithms of numerical data. Therefore, the research study implemented clustering techniques based on the similarity of categorical data. Simultaneously, the attributes of inter and intra-clusters' similarities are identified, and then the performance of proposed method is improved by integrating those similarities. The noises are also removed by performing the pre-processing techniques, so the similarity between noise-free elements are estimated. Once these similarities are identified, the insignificant attributes are removed and the relevant attributes are chosen from the preprocessed elements. The overhead is reduced by developing the Similarity-based K-means Clustering (SKC) approach for clustering the attributes that depends on divergence distance. The efficiency of SKC is tested in the experimental analysis by means of precision, f-measure, accuracy, error rate of clustering and recall. The results state that the developed study achieved 98.45% accuracy for the publicly available dataset when comparing with the existing techniques: variations of Particle Swarm Optimization (PSO) and semi-supervised clustering system.

show abstract

Section: Performance Analysis Of Proposed Methods By Means Of Accuracymentioning

confidence: 99%

Section: Performance Analysis Of Proposed Methods By Means Of Accuracymentioning

confidence: 99%

Section: Literature Reviewmentioning

confidence: 99%

Section: Literature Reviewmentioning

confidence: 99%

See 2 more Smart Citations

A Similarity based K-Means Clustering Technique for Categorical Data in Data Mining Application

Kumar¹,

Kanavalli²

2021

IJIES

View full text Add to dashboard Cite

show abstract

“…We aim at connecting similar vertices such that the resulting connected components of the graph will be cliques. The problem was given notable attention due to its applications in various fields such as data mining [14,33,23,15], machine learning [27,1,32,9], computational biology [31], and many others.…”

Section: Chapter One Introductionmentioning

confidence: 99%

Correlation Clustering with Overlaps. (c2020)

Fakhereldine¹

View full text Add to dashboard Cite

The Cluster Editing problem asks for transforming a given graph into a disjoint union of cliques by applying a minimal number of edge-editing operations. The allowed operations include addition of non-existing edges and deletion of existing ones. We study a multi-parameterized version of the problem that limits the global number of allowed edge editing operations in the graph and the local amounts of the edge edits performed per vertex. Moreover, we allow the new vertex splitting operation, which allows the resulting clusters to overlap. In other words, data elements (or vertices) will be allowed to be members in more than one cluster instead of limiting them to only one single cluster, as in classical clustering methods. We present a heuristic algorithm and a semi-exact algorithm for the Multi-Parameterized Cluster Editing with Vertex Splitting problem. In our experimental analysis, we study the efficiency of our algorithms as well as the effectiveness of allowing vertex splitting. In particular, we show that allowing vertex splitting yields higher clustering accuracy and higher intra-cluster similarity.

show abstract

Semi-supervised feature selection based on discernibility matrix and mutual information

Qian,

Wan,

Shu

2024

Appl Intell

View full text Add to dashboard Cite

Machine learning integrated credibilistic semi supervised clustering for categorical data

Cited by 10 publications

References 56 publications

A Similarity based K-Means Clustering Technique for Categorical Data in Data Mining Application

A Similarity based K-Means Clustering Technique for Categorical Data in Data Mining Application

Correlation Clustering with Overlaps. (c2020)

Semi-supervised feature selection based on discernibility matrix and mutual information

Contact Info

Product

Resources

About