Determining the structure of data without prior knowledge of the number of clusters or any information about their composition is a problem of interest in many fields, such as image analysis, astrophysics, biology, etc. Partitioning a set of n patterns in a ^-dimensional feature space must be done such that those in a given cluster are more similar to each other than the rest. As there are approximately ^-possible ways of partitioning the patterns among K clusters, finding the best solution is very hard when n is large. The search space is increased when we have no a priori number of partitions. Although the self-organizing feature map (SOM) can be used to visualize clusters, the automation of knowledge discovery by SOM is a difficult task. This paper proposes region-based image processing methods to post-processing the U-matrix obtained after the unsupervised learning performed by SOM. Mathematical morphology is applied to identify regions of neurons that are similar. The number of regions and their labels are automatically found and they are related to the number of clusters in a multivariate data set. New data can be classified by labeling it according to the best match neuron. Simulations using data sets drawn from finite mixtures of p-variate normal densities are presented as well as related advantages and drawbacks of the method.
Self-organizing map has been applied to a variety of tasks including data visualization and clustering. Once the point density of the neurons approximates the density of data, it is possible to miner clustering information from the data set after its unsupervised learning by using the neuron's relations. This paper presents a new algorithm for dynamical generation of a hierarchical structure of selforganizing maps with applications to data analysis. Di' erently from other tree-structured SOM approaches, which nodes are neurons, in this case the tree nodes are actually maps. From top to down, maps are automatically segmented by using the U-matrix information, which presents relations between neighboring neurons. The automatic map partitioning algorithm is based on mathematical morphology segmentation and it is applied to each map in each level of the hierarchy. Clusters of neurons are automatically identified and labeled and generate new sub-maps. Data are partitioned accordingly the label of its best match unit in each level of the tree. The algorithm may be seen as a recursive partition clustering method with multiple prototypes cluster representation, which enables the discoveries of clusters in a variety of geometrical shapes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.