2011 International Conference on Document Analysis and Recognition 2011
DOI: 10.1109/icdar.2011.22
|View full text |Cite
|
Sign up to set email alerts
|

Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method

Abstract: Abstract-In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors. A later refinement of the feature vectors is performed by applying the latent semantic indexing technique. The proposed method performs well on both handwritten and typewritten historical document images. We have also tested our method on documen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

2
116
0
3

Year Published

2013
2013
2021
2021

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 115 publications
(121 citation statements)
references
References 12 publications
2
116
0
3
Order By: Relevance
“…The Fisher vector can be understood as a bag of words that also encodes higher order statistics, and has been shown to be a state-of-the-art encoding method for several computer vision tasks such as image classification and retrieval [3]. Yet, as argued by other authors [1,28] , descriptors such as the FV do not directly capture all the flexibility needed in a multi-writer setting: although the results on a single-writer dataset are competitive, the accuracy dramatically drops when using more challenging datasets with large variations in style. We postulate that leveraging supervised information to learn the similarities and differences between different writing styles is of paramount importance to compensate for the lack of flexibility of the fixed-length representations, and that not exploiting this information is one of the main causes of their subpar performance.…”
Section: Introductionmentioning
confidence: 99%
See 3 more Smart Citations
“…The Fisher vector can be understood as a bag of words that also encodes higher order statistics, and has been shown to be a state-of-the-art encoding method for several computer vision tasks such as image classification and retrieval [3]. Yet, as argued by other authors [1,28] , descriptors such as the FV do not directly capture all the flexibility needed in a multi-writer setting: although the results on a single-writer dataset are competitive, the accuracy dramatically drops when using more challenging datasets with large variations in style. We postulate that leveraging supervised information to learn the similarities and differences between different writing styles is of paramount importance to compensate for the lack of flexibility of the fixed-length representations, and that not exploiting this information is one of the main causes of their subpar performance.…”
Section: Introductionmentioning
confidence: 99%
“…Finally, recent approaches that are not limited to keywords can be found in [10,28,1]. Gatos et al [10] perform a template matching of block-based image descriptors, Rusiñol et al [28] use an aggregation of SIFT descriptors into a bag of visual words to describe images, while Almazán et al [1] use HOG descriptors [5] combined with an exemplar-SVM framework. These fast-to-compare representations allow them to perform word spotting using a sliding window over the whole document without segmenting it into individual words.…”
Section: Word Representationmentioning
confidence: 99%
See 2 more Smart Citations
“…This work focuses on the case where queries are presented to the system as strings typed by the user (known as Query-by-String) [21,4,6,29,1,26,12,16,17,25], although an alternative formulation of KWS, where queries are presented as example images (known as Query-by-Example) is also very popular in the literature [10,19,11,7,22,5,3,27]. Query-by-Example approaches are typically training-free and are based on template (image) matching between the query (example image) and word-sized image regions of the documents.…”
Section: Introductionmentioning
confidence: 99%