2021
DOI: 10.48550/arxiv.2112.12916
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition

Abstract: Existing Scene Text Recognition (STR) methods typically use a language model to optimize the joint probability of the 1D character sequence predicted by a visual recognition (VR) model, which ignore the 2D spatial context of visual semantics within and between character instances, making them not generalize well to arbitrary shape scene text. To address this issue, we make the first attempt to perform textual reasoning based on visual semantics in this paper. Technically, given the character segmentation maps … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 30 publications
0
4
0
Order By: Relevance
“…This study makes a significant contribution by addressing the unexplored but practical HMSTR task, which cannot be effectively handled by state-of-the-art OCR and STR techniques 13 . Specifically, our experiments demonstrate that the proposed approach effectively recognizes hand-marked domain-specific terms, jargon, loan words, and abbreviations 28 , for which other typical text recognition models lack training.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…This study makes a significant contribution by addressing the unexplored but practical HMSTR task, which cannot be effectively handled by state-of-the-art OCR and STR techniques 13 . Specifically, our experiments demonstrate that the proposed approach effectively recognizes hand-marked domain-specific terms, jargon, loan words, and abbreviations 28 , for which other typical text recognition models lack training.…”
Section: Discussionmentioning
confidence: 99%
“…Recent advancements in STR have focused on developing context-aware methods to improve text recognition accuracy. For instance, He et al 13 proposed a segmentation baseline with graph-based textual reasoning (S-GTR), which utilizes a graph convolutional network to incorporate visual context for textual reasoning. Bautista and Atienza 14 enhanced STR performance by combining context-free and context-aware autoregressive inference.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…More specifically, they obtain OCR annotations from an open source OCR engine Tesseract [45] for 5 Million documents from IIT-CDIP [25] dataset. With the introduction of pre-training strategy and advances in modern OCR engine [1,12,20,28,34], many contemporary approaches [7,2,53] have utilized even more data to advance the Document Intelligence field.…”
Section: Introductionmentioning
confidence: 99%