Cluster Ensemble Selection

Fern, Xiaoli Z.; Lin, Wei

doi:10.1137/1.9781611972788.71

Cited by 55 publications

(115 citation statements)

References 13 publications

Supporting

Mentioning

114

Contrasting

Unclassified

Order By: Relevance

“…Future work is to increase the number of clusterers in the ensemble and investigate ensemble selection approaches (Fern and Lin, 2008) in order to avoid the potential degradation of the ensemble performance if a significant number of "bad" clusterers inappropriate for a dataset are present among the ensemble components.…”

Section: Discussionmentioning

confidence: 99%

On Ambiguity Detection and Postprocessing Schemes Using Cluster Ensembles

Albalate

Suchindranath

Soenmez

et al. 2010

Proceedings of the 2nd International Conference on Agents and Artificial Intelligence

View full text Add to dashboard Cite

Abstract:In this paper, we explore the cluster ensemble problem and propose a novel scheme to identify uncertain/ambiguous regions in the data based on the different clusterings in the ensemble. In addition, we analyse two approaches to deal with the detected uncertainty. The first, simplest method, is to ignore ambiguous patterns prior to the ensemble consensus function, thus preserving the non-ambiguous data as good "prototypes" for any further modelling. The second alternative is to use the ensemble solution obtained by the first method to train a supervised model (support vector machines), which is later applied to reallocate, or "recluster" the ambiguous patterns. A comparative analysis of the different ensemble solutions and the base weak clusterings has been conducted on five data sets: two artificial mixtures of five and seven Gaussian, and three real data sets from the UCI machine learning repository. Experimental results have shown in general a better performance of our proposed schemes compared to the standard ensembles.

show abstract

Section: Discussionmentioning

confidence: 99%

On Ambiguity Detection and Postprocessing Schemes Using Cluster Ensembles

Albalate

Suchindranath

Soenmez

et al. 2010

Proceedings of the 2nd International Conference on Agents and Artificial Intelligence

View full text Add to dashboard Cite

show abstract

“…One practical advantage is that if two different hard clustering algorithms are applied to the same data set, results are mostly different. It is considered to be very hard to find an optimal way to combine these different clusterings [3,5]. The basic reason for the difficulty of combining different clustering results is the inconsistency between the clusterings, more precisely the fact that while a data element belongs to one cluster according to the first algorithm, it belongs to another cluster according to the second algorithm.…”

Section: Generalized Hard Cluster Analysismentioning

confidence: 99%

Generalized hard cluster analysis

Mulder

2011

International Journal of Computer Mathematics

View full text Add to dashboard Cite

In this paper, we generalize the hard clustering paradigm. While in this paradigm a data set is subdivided into disjoint clusters, we allow different clusters to have a nonempty intersection. The concept of hard clustering is then analysed in this general setting, and we show which specific properties hard clusterings possess in comparison to more general clusterings. We also introduce the concept of equivalent clusterings and show that in the case of hard clusterings equivalence and equality coincide. However, if more general clusterings are considered, these two concepts differ, and this implies the undesired fact that equivalent clusterings can have different representations in the traditional view on clustering. We show how a matrix representation can solve this representation problem

show abstract

“…Oza & Tumer (2008) do the same in a more recent work, in which they present real applications, where using classifier ensembles has been obtaining a greater success in comparison to using individual classifiers, including remote sensoring, medicine and pattern recognition. Fern (2008) analyses how to combine several available solutions to create a more effective cluster ensemble, based on two critical factors in the performance of a cluster ensemble: quality and diversity of solutions. Leisch (1998), one of the pioneers in the branch of cluster ensembles, introduced an algorithm named bagged clustering, which performs several instances of K-means algorithm, in the attempt of obtaining a certain stability in the results and combines partial results through a hierarchical partitioning method.…”

Section: Classification and Cluster Ensemblementioning

confidence: 99%