2021
DOI: 10.14569/ijacsa.2021.0120776
|View full text |Cite
|
Sign up to set email alerts
|

Content-based Image Retrieval using Tesseract OCR Engine and Levenshtein Algorithm

Abstract: Image Retrieval Systems (IRSs) are applications that allow one to retrieve images saved at any location on a network. Most IRSs make use of reverse lookup to find images stored on the network based on image properties such as size, filename, title, color, texture, shape, and description. This paper provides a technique for obtaining full image document given that the user has some portions of the document under search. To demonstrate the reliability of the proposed technique, we designed a system to implement … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0
1

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 18 publications
0
4
0
1
Order By: Relevance
“…The improvement in the quality of the content-based search for the digital correspondence document was achieved through the availability of the required criteria such as the classification and display of the ontology relationships information to ease the users' understanding of the hierarchy of the letter found and also to display the required document. This facility is unavailable in conventional search which is designed based on the document name or annotations [3] as well as unclassified content [10]- [15]. A trial was conducted as an example by searching for document names through the input of the query "bantuan pemerintah" and no document was shown and this complicated the search process.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…The improvement in the quality of the content-based search for the digital correspondence document was achieved through the availability of the required criteria such as the classification and display of the ontology relationships information to ease the users' understanding of the hierarchy of the letter found and also to display the required document. This facility is unavailable in conventional search which is designed based on the document name or annotations [3] as well as unclassified content [10]- [15]. A trial was conducted as an example by searching for document names through the input of the query "bantuan pemerintah" and no document was shown and this complicated the search process.…”
Section: Resultsmentioning
confidence: 99%
“…Several research [10]- [15] have been conducted about content-based image document search using OCR technology but they are only limited to searching base content for scanned documents without focusing on classified documents for a more specific search. Meanwhile, the increasing number and diversity of documents are making the classification process important to direct, summarize, and organize the documents easily, with efficient and cost-effective solutions [16].…”
Section: Introductionmentioning
confidence: 99%
“…where TLD is the Total Levenshtein Distance. The Levenshtein Distance, also known as the Edit-Distance algorithm, measures the number of characters that must be changed, added, or deleted in the predicted word so that it matches the true word [31]. Total Levenshtein distance does not apply to a word; it applies to the whole text.…”
Section: Evaluation Metricsmentioning
confidence: 99%
“…Adjetey&Sarpong proposed a novel algorithm for recognizing text [34], making documents editable and searchable in images, extracting text from images using Levenshtein Algorithm and Tesseract OCR to searchtext from images to find in the document. Begin by locating and comparing the texts extracted from the images using the Levenshtein text-matching algorithm.…”
Section: Optical Character Recognitionmentioning
confidence: 99%