Proceedings of the Third Workshop on Analytics for Noisy Unstructured Text Data 2009
DOI: 10.1145/1568296.1568304
|View full text |Cite
|
Sign up to set email alerts
|

Text retrieval from early printed books

Abstract: Retrieving text from early printed books is particularly difficult because in these documents, the words are very close one to the other and, similarly to medieval manuscripts, there is a large use of ligatures and abbreviations.To address these problems, we propose a word indexing and retrieval technique that does not require word segmentation and is tolerant to errors in character segmentation. Two main principles characterize the approach. First, characters are identified in the pages and clustered with sel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2016
2016
2016
2016

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 24 publications
0
2
0
Order By: Relevance
“…Once documents are indexed, the resulting index vectors can be considered as signatures and used for retrieval [4]. In [38,50,51], indexing of words in old documents has been carried out using self-organizing maps (SOMs), and similar symbols have been clustered in a sub-set of the document.…”
Section: Indexing/learning Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Once documents are indexed, the resulting index vectors can be considered as signatures and used for retrieval [4]. In [38,50,51], indexing of words in old documents has been carried out using self-organizing maps (SOMs), and similar symbols have been clustered in a sub-set of the document.…”
Section: Indexing/learning Methodsmentioning
confidence: 99%
“…In [38], text retrieval from early printed books carried out using character recognition is described. Characters have been recognized with connected component features as character objects.…”
Section: ) Connected Component-based Featuresmentioning
confidence: 99%