On the Influence of Word Representations for Handwritten Word Spotting in Historical Documents

Lladós, Josep; Rusiñol, Marçal; Fornés, Alícia; Fernández, David; Dutta, Anjan

doi:10.1142/s0218001412630025

Cited by 43 publications

(20 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the document image analysis literature, we can distinguish two different families of keyword spotting methods depending on the representation of the handwritten words [26]. On the one hand, sequential word representations [35] describe handwritten words as a time series by using a sliding window in the writing direction.…”

Section: Introductionmentioning

confidence: 99%

A study of Bag-of-Visual-Words representations for handwritten keyword spotting

Aldavert

Rusiñol

Toledo

et al. 2015

IJDAR

Self Cite

View full text Add to dashboard Cite

The Bag-of-Visual-Words (BoVW) framework has gained popularity among the document image analysis community, specifically as a representation of handwritten words for recognition or spotting purposes. Although in the computer vision field the BoVW method has been greatly improved, most of the approaches in the document image analysis domain still rely on the basic implementation of the BoVW method disregarding such latest refinements. In this paper, we present a review of those improvements and its application to the keyword spotting task. We thoroughly evaluate their impact against a baseline system in the well-known George Washington dataset and compare the obtained results against nine state-of-the-art keyword spotting methods. In addition, we also compare both the baseline and improved systems with the methods presented at the Handwritten Keyword Spotting Competition 2014.

show abstract

Section: Introductionmentioning

confidence: 99%

A study of Bag-of-Visual-Words representations for handwritten keyword spotting

Aldavert

Rusiñol

Toledo

et al. 2015

IJDAR

Self Cite

View full text Add to dashboard Cite

show abstract

“…In word spotting literature, dynamic time warping (DTW) is one of the most commonly used methods to calculate the similarity of words [9,18,33,43,46,48]. DTW can tolerate spatial variations unlike other methods such as XOR, Euclidean Distance Mapping, Sum of Squared Differences [47].…”

Section: Related Workmentioning

confidence: 99%

Cross-document word matching for segmentation and retrieval of Ottoman divans

Duygulu

Arifoǧlu

Kalpaklı

2014

Pattern Anal Applic

View full text Add to dashboard Cite

Cataloged from PDF version of article.Motivated by the need for the automatic\ud indexing and analysis of huge number of documents in\ud Ottoman divan poetry, and for discovering new knowledge\ud to preserve and make alive this heritage, in this study we\ud propose a novel method for segmenting and retrieving\ud words in Ottoman divans. Documents in Ottoman are dif-\ud ficult to segment into words without a prior knowledge of\ud the word. In this study, using the idea that divans have\ud multiple copies (versions) by different writers in different\ud writing styles, and word segmentation in some of those\ud versions may be relatively easier to achieve than in other\ud versions, segmentation of the versions (which are difficult,\ud if not impossible, with traditional techniques) is performed\ud using information carried from the simpler version. One\ud version of a document is used as the source dataset and the\ud other version of the same document is used as the target\ud dataset. Words in the source dataset are automatically\ud extracted and used as queries to be spotted in the target\ud dataset for detecting word boundaries. We present the idea\ud of cross-document word matching for a novel task of\ud segmenting historical documents into words. We propose a\ud matching scheme based on possible combinations of\ud sequence of sub-words. We improve the performance of\ud simple features through considering the words in a context.\ud The method is applied on two versions of Layla and\ud Majnun divan by Fuzuli. The results show that, the proposed\ud word-matching-based segmentation method is\ud promising in finding the word boundaries and in retrieving\ud the words across documents

show abstract

“…The first one consists of 27 pages from a collection of marriage registers in the Barcelona Cathedral from 1451 to 1905 [15]. For the second evaluation corpus, we select part of the IAM off-line dataset, which is the biggest collection of a unique writing style.…”

Section: Experimental Data and Evaluation Criteriamentioning

confidence: 99%

Handwritten word spotting based on a hybrid optimal distance

Wang

Églin

Largeron

et al. 2014

2014 IEEE International Conference on Image Processing (ICIP)

View full text Add to dashboard Cite

In this paper, we develop a comprehensive representation model for handwriting, which contains both morphological and topological information. An adapted Shape Context descriptor built on structural points is employed to describe the contour of the text. Graphs are first constructed by using the structural points as nodes and the skeleton of the strokes as edges. Based on graphs, Topological Node Features (TNFs) of n-neighbourhood are extracted. Bag-of-Words representation model based on the TNFs is employed to depict the topological characteristics of word images. Moreover, a novel approach for word spotting application by using the proposed model is presented. The final distance is a weighted mixture of the SC cost, and the TNF distribution comparison. Linear Discriminant Analysis (LDA) is used to learn the optimal weight for each part of the distance with the consideration of writing styles. The evaluation of the proposed approach shows the significance of combining the properties of the handwriting from different aspects.

show abstract

On the Influence of Word Representations for Handwritten Word Spotting in Historical Documents

Cited by 43 publications

References 36 publications

A study of Bag-of-Visual-Words representations for handwritten keyword spotting

A study of Bag-of-Visual-Words representations for handwritten keyword spotting

Cross-document word matching for segmentation and retrieval of Ottoman divans

Handwritten word spotting based on a hybrid optimal distance

Contact Info

Product

Resources

About