The growing self-organizing map (GSOM) is a variation of the popular self-organizing map (SOM). It was developed to address the issue of identifying a suitable size of the SOM, which is usually concerned with vectorial items. To deal with algoritms implemented as programs, which are hardly represented by vectors, a new version of GSOM for clustering non-vectorial items (GSOM/NV) is proposed here. By syntax analysis, source codes of programs are converted into syntax trees, on a basis of which similarities between these codes are computed, so that the normal GSOM could be applied to clustering the algorithms that are implemented as the programs. An experiment shows that those whose implemented algorithms are the same, but coded differently each other, are gathered together on the visualization map generated by the proposed method.
Self-organizing maps (SOMs), a data visualization technique invented by Professor Teuvo Kohonen, reduces the dimensions of data through the use of self-organizing neural networks. In this paper, we present an approach to cluster the different topics of knowledge from programming codes without manual labour. First, syntax trees are generated for programming codes, and then the similarities between them are computed in order to get a generalized mean of the syntax trees for the non-vectorial self-organizing maps model. On the visualization map, the different topics of knowledge extracted from the programming codes will be gathered together. The experiment will demonstrate its feasibility in the context of a algorithm clustering task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.