In this paper, the Earth Mover's Distance (EMD) is used as a similarity measure in the mathematical symbol retrieval task. The approach is based on the Bag-of-Visual-Words model. In our case the features extracted from each symbol are clustered by means of Self-Organizing Maps (SOM) and then occurrences of features in the clusters are accumulated in a vector of visual words. The comparison between the latter vectors is performed with the EMD which naturally allows to incorporate the topological organization of SOM clusters in the distance computation. The proposed approach is experimentally tested in a mathematical symbol retrieval task and compared with the cosine similarity and with some variants that have been recently proposed.
This paper addresses the indexing and retrieval of mathematical symbols from digitized documents. The proposed approach exploits Shape Contexts (SC) to describe the shape of mathematical symbols. Starting from the vector space method, that is based on SC clustering, we explore the use of topological ordered clusters to improve the retrieval performance. The clustering is computed by means of SelfOrganizing Maps that organize the clusters in two dimensional topologically ordered feature maps. The retrieval performance are compared with those obtained using the K-means clustering on a large collection of mathematical symbols gathered from the widely used INFTY database.
In this paper, we describe a general approach for script (and language) recognition from printed documents and for writer identification in handwritten documents. The method is based on a bag of visual word strategy where the visual words correspond to characters and the clustering is obtained by means of Self Organizing Maps (SOM). Unknown pages (words in the case of script recognition) are classified comparing their vectorial representations with those of one training set using a cosine similarity. The comparison is improved using a similarity score that is obtained taking into account the SOM organization of cluster centroids. Promising results are presented for both printed documents and handwritten musical scores.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.