2021
DOI: 10.18287/2412-6179-co-795
|View full text |Cite
|
Sign up to set email alerts
|

Weighted combination of per-frame recognition results for text recognition in a video stream

Abstract: The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(8 citation statements)
references
References 46 publications
0
8
0
Order By: Relevance
“…One of the considerations for using a video stream as an input is that it makes the input more resistant to tampering, as the video is harder to falsify in comparison with a single uploaded image, especially if the document analysis procedure is performed in real-time. From a document analysis and recognition perspective, using multiple input images of the same object presents several advantages: filtering and refinement techniques could now be employed for improving object detection and location accuracy [32,66], the so-called "super-resolution" techniques [67] could be employed for obtaining images of higher quality, and text recognition results could be improved by means of accumulating per-frame recognition results in a single most reliable one [68]. Fig.…”
Section: Videostreammentioning
confidence: 99%
See 2 more Smart Citations
“…One of the considerations for using a video stream as an input is that it makes the input more resistant to tampering, as the video is harder to falsify in comparison with a single uploaded image, especially if the document analysis procedure is performed in real-time. From a document analysis and recognition perspective, using multiple input images of the same object presents several advantages: filtering and refinement techniques could now be employed for improving object detection and location accuracy [32,66], the so-called "super-resolution" techniques [67] could be employed for obtaining images of higher quality, and text recognition results could be improved by means of accumulating per-frame recognition results in a single most reliable one [68]. Fig.…”
Section: Videostreammentioning
confidence: 99%
“…where   is the field recognition method and   is the metric function on the set of field recognition results, both corresponding to the text field recognition problem (5) stated above. Even simple selection strategies, such as selecting a single result with the maximum value of input image quality or of a recognition result confidence level [68] can be considered a combination method, along with alignment procedures such as ROVER (Recognizer Output Voting Error Reduction) [88] and its extension for text recognition results with per-character alternatives [86].…”
Section: C) Template Processingmentioning
confidence: 99%
See 1 more Smart Citation
“…The results presented in [21,36] show that the presence of blur in text field images decreases the quality of recognition. It should be noted that the recognition submodule may contain a pre-processing step that refines the image using a deblurring method, for example, [37].…”
Section: Document Image Quality Assessment Problem Statementmentioning
confidence: 99%
“…In [21,36], the authors incorporated an estimation of image blur into the algorithms of a combination of text recognition results in a video stream. Since an unblurred text image has high contrast in regions corresponding to strokes, they assume that the level of image blur is inversely related to the sharpness (called focus in the cited papers), which represents the directional minimum of the highest local contrasts of the image.…”
Section: The Minimum Scaling Coefficient Assessment At a Restored Image Pointmentioning
confidence: 99%