In this paper, a prototype-based supervised clustering algorithm is proposed. The proposed algorithm, called the Supervised Growing Neural Gas algorithm (SGNG), incorporates several techniques from some unsupervised GNG algorithms such as the adaptive learning rates and the cluster repulsion mechanisms of the Robust Growing Neural Gas algorithm, and the Type Two Learning Vector Quantization (LVQ2) technique. Furthermore, a new prototype insertion mechanism and a clustering validity index are proposed. These techniques are designed to utilize class labels of the training data to guide the clustering. The SGNG algorithm is capable of clustering adjacent regions of data objects labeled with different classes, formulating topological relationships among prototypes and automatically determining the optimal number of clusters using the proposed validity index. To evaluate the effectiveness of the SGNG algorithm, two experiments are conducted. The first experiment uses two synthetic data sets to graphically illustrate the potential with respect to growing ability, ability to cluster adjacent regions of different classes, and ability to determine the optimal number of prototypes. The second experiment evaluates the effectiveness using the UCI benchmark data sets. The results from the second experiment show that the SGNG algorithm performs better than other supervised clustering algorithms for both cluster impurities and total running times.Several supervised clustering algorithms have been proposed. Slonim and Tishby [6] proposed a bottomup agglomerative algorithm, based on hierarchical approaches, that uses the Information Bottleneck Method for merging similar data objects in a way that minimizes a cost function. Aguilar [4] also proposed a bottom-up agglomerative algorithm by merging neighboring clusters labeled with the same class. Zeidat [7] introduced three clustering algorithms, namely SPAM (a variant of the Partitioning Around Medios (PAM) algorithm), SRIDHCR (which uses a random search and greedy approach for selecting a data sample to be a representative of a cluster), and SCEC (which uses a evolutionary computation to seek the optimal set of representatives). These three algorithms attempt to minimize a fitness function which measures class impurity against the number of clusters. Pedrycz and Vukovich [12] proposed the fuzzy c-means with a supervision algorithm. The algorithm includes a class constraint factor
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.