2011
DOI: 10.1007/s10032-010-0146-0
|View full text |Cite
|
Sign up to set email alerts
|

Text retrieval from early printed books

Abstract: Retrieving text from early printed books is particularly difficult because in these documents, the words are very close one to the other and, similarly to medieval manuscripts, there is a large use of ligatures and abbreviations.To address these problems, we propose a word indexing and retrieval technique that does not require word segmentation and is tolerant to errors in character segmentation. Two main principles characterize the approach. First, characters are identified in the pages and clustered with sel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2011
2011
2020
2020

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 12 publications
(1 citation statement)
references
References 29 publications
0
1
0
Order By: Relevance
“…The first is OCR-based retrieval and the second is content-based retrieval [1,2]. The latter is simple and does not require off-line samples and complicated supervised learning [3][4][5][6][7][8]. Direct image matching is often done for content-based retrieval, which confines the application to some extent for the sake of low matching speed.…”
Section: Introductionmentioning
confidence: 99%
“…The first is OCR-based retrieval and the second is content-based retrieval [1,2]. The latter is simple and does not require off-line samples and complicated supervised learning [3][4][5][6][7][8]. Direct image matching is often done for content-based retrieval, which confines the application to some extent for the sake of low matching speed.…”
Section: Introductionmentioning
confidence: 99%