Features for word spotting in historical manuscripts

Rath, Toni M.; Manmatha, R.

doi:10.1109/icdar.2003.1227662

Cited by 143 publications

(96 citation statements)

References 6 publications

(10 reference statements)

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…To examine the effect of weighting the influence of each feature, we have built an evaluation environment reproducing what is described in [1,2]. Their implementation consisted of a two part word matching pipeline, pruning and DTW, described below.…”

Section: Methodsmentioning

confidence: 99%

“…In [1], the concept of dynamic time warping (DTW) for word matching was developed, with an application focus. The goal was to be able to create an ordered list of matches between template word images and some collection of word images.…”

Section: Previous Workmentioning

confidence: 99%

“…In [3], several computationally fast pruning rules were introduced, later developed further by [8] and used by [2,1] . By using these rules, a large portion of potential matches can be removed before executing DTW.…”

Section: Previous Workmentioning

confidence: 99%

“…By only allowing a diagonal path through the weight matrix for warping, pathological warpings (where a very small portion of one word is matched to large parts of another) are avoided. In [1], the allowed warping (i.e. the number of elements around the diagonal) was set to 15.…”

Section: Word-spottingmentioning

confidence: 99%

“…In contrast to earlier papers on the same theme [1][2][3], we have not used pruning (i.e. heuristic exclusion rules based on simple geometric features) to exclude potential word matches.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Feature Weight Optimization and Pruning in Historical Text Recognition

Wahlberg

Brun

2013

Advances in Visual Computing

View full text Add to dashboard Cite

Abstract. In handwritten text recognition, "sliding window" feature extraction represent the visual information contained in written text as feature vector sequences. In this paper, we explore the parameter space of feature weights in search for optimal weights and feature selection using the coordinate descent method. We report a gain of about 5% AUC performance. We use a public dataset for evaluation and also discuss the effects and limitations of "word pruning," a technique in word spotting that is commonly used to boost performance and save computational time.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Previous Workmentioning

confidence: 99%

Section: Previous Workmentioning

confidence: 99%

Section: Word-spottingmentioning

confidence: 99%

“…In contrast to earlier papers on the same theme [1][2][3], we have not used pruning (i.e. heuristic exclusion rules based on simple geometric features) to exclude potential word matches.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Feature Weight Optimization and Pruning in Historical Text Recognition

Wahlberg

Brun

2013

Advances in Visual Computing

View full text Add to dashboard Cite

show abstract

Using Lucene to index and search the digitized 1940 US Census

Diesendruck

Kooper

Marini

et al. 2014

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYAn improved approach toward enabling search capabilities over large digitized document archives is described, in which Lucene indices were incorporated in a framework developed to provide automatic searchable access to the 1940 US Census, a collection composed of digitized handwritten forms. As an alternative to trying to recognize the handwritten text in the images, Word Spotting feature vectors are used to describe each cell's content. Instead of querying the system using regular ASCII text, any query is rendered as an image, and a ranked list of matching results is presented to the user. Among other preprocessing steps required by the framework, an index must be compiled to provide fast access to the feature vectors. The advantages and drawbacks of using Lucene to index these vectors instead of other indexing methods are discussed in light of the challenges confronted when dealing with digitized document collections of considerable size. Copyright © 2014 John Wiley & Sons, Ltd.

show abstract

Retrieval from Document Image Collections

Balasubramanian

Meshesha

Jawahar

2006

Lecture Notes in Computer Science

View full text Add to dashboard Cite

This paper presents a system for retrieval of relevant documents from large document image collections. We achieve effective search and retrieval from a large collection of printed document images by matching image features at word-level. For representations of the words, profile-based and shape-based features are employed. A novel DTWbased partial matching scheme is employed to take care of morphologically variant words. This is useful for grouping together similar words during the indexing process. The system supports cross-lingual search using OM-Trans transliteration and a dictionary-based approach. Systemlevel issues for retrieval (eg. scalability, effective delivery etc.) are addressed in this paper.

show abstract

Features for word spotting in historical manuscripts

Abstract: Abstract

Cited by 143 publications

References 6 publications

Feature Weight Optimization and Pruning in Historical Text Recognition

Feature Weight Optimization and Pruning in Historical Text Recognition

Using Lucene to index and search the digitized 1940 US Census

Retrieval from Document Image Collections

Contact Info

Product

Resources

About