2016 International Joint Conference on Neural Networks (IJCNN) 2016
DOI: 10.1109/ijcnn.2016.7727648
|View full text |Cite
|
Sign up to set email alerts
|

A brief review of document image retrieval methods: Recent advances

Abstract: Due to the rapid increase of different digitized documents, the development of a system to automatically retrieve document images from a large collection of structured and unstructured document images is in high demand. Many techniques have been developed to provide an efficient and effective way for retrieving and organizing these document images in the literature. This paper provides an overview of the methods which have been applied for document image retrieval over recent years. It has been found that from… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
4
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
3
2
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 60 publications
0
4
0
Order By: Relevance
“…Document image retrieval is normally used to find the appropriate document image from the database based on the user's queries. It has two approaches which include retrieval based on text recognition using optical character recognition (OCR) which relies on recognizing the text from the document and later examining the similarity as well as the retrieval without text recognition (not using OCR) which relies on image features in the document and later calculating the similarity with the actual content of the images [7]. This simply means the content-based document image retrieval method has 2 main stages which are the extraction of the text/text image feature and the search based on matching the text/text image [8].…”
Section: Introductionmentioning
confidence: 99%
“…Document image retrieval is normally used to find the appropriate document image from the database based on the user's queries. It has two approaches which include retrieval based on text recognition using optical character recognition (OCR) which relies on recognizing the text from the document and later examining the similarity as well as the retrieval without text recognition (not using OCR) which relies on image features in the document and later calculating the similarity with the actual content of the images [7]. This simply means the content-based document image retrieval method has 2 main stages which are the extraction of the text/text image feature and the search based on matching the text/text image [8].…”
Section: Introductionmentioning
confidence: 99%
“…For example, some databases offer a controlled vocabulary like a thesaurus or taxonomy from which to choose the search terms (e.g., the Medical Subject Headings [MeSH] in PubMed), whereas others offer a full text search. Regarding the latter, indexing scanned documents to offer a full text search, requires pre-processing methods like optical character recognition (OCR), known to include typos, and post-OCR processing, both affecting information retrieval accuracy [ 18 23 ].…”
Section: Introductionmentioning
confidence: 99%
“…For example, some databases offer a controlled vocabulary like a thesaurus or taxonomy from which to choose the search terms (e.g., the Medical Subject Headings [MeSH] in PubMed), whereas others offer a full text search. Regarding the latter, indexing scanned documents to offer a full text search, requires pre-processing methods like optical character recognition (OCR), known to include typos, and post-OCR processing, both affecting information retrieval accuracy [18][19][20][21][22][23].…”
Section: Introductionmentioning
confidence: 99%