2019
DOI: 10.1609/aaai.v33i01.33018714
|View full text |Cite
|
Sign up to set email alerts
|

Scene Text Recognition from Two-Dimensional Perspective

Abstract: Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem. Though achieving excellent performance, these methods usually neglect an important fact that text in images are actually distributed in two-dimensional space. It is a nature quite different from that of speech, which is essentially a one-dimensional signal. In principle, directly compressing features of text into a one-dimensional form may lose useful information and intro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
175
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 234 publications
(176 citation statements)
references
References 11 publications
1
175
0
Order By: Relevance
“…Different from those LSTM-based approaches, recognizers without LSTM can better leverage the spatial information, but they also unavoidably introduce additional parameters or post processing steps in order to produce sequential outputs, such as the multiple classifiers used by STN-OCR [9] and the word formation module designed in [10].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Different from those LSTM-based approaches, recognizers without LSTM can better leverage the spatial information, but they also unavoidably introduce additional parameters or post processing steps in order to produce sequential outputs, such as the multiple classifiers used by STN-OCR [9] and the word formation module designed in [10].…”
Section: Related Workmentioning
confidence: 99%
“…Backbone: Similar to Liao's work [10], we take VGG-16 as the encoder of our feature extraction module, and remove the fully connected layers and pooling layers from the last two encoding stages. We also assemble two deformable convolutional layers [24] at stage-4 and stage-5 of the decoder given their flexible receptive fields.…”
Section: Cnn-based Feature Extractionmentioning
confidence: 99%
See 1 more Smart Citation
“…Recently, CA-FCN [11] takes the two-dimensional spatial distribution of text into consideration, and text recognition is reformulated as semantic segmentation, where character categories are segmented from the background. However, their method abandons the use of recurrent neural networks (RNN), and thus fails to obtain an overall vision.…”
Section: Introductionmentioning
confidence: 99%
“…Over the years, optical character recognition has been a popular research topic for computer vision specialists [1][2][3][4][5][6]. Convolutional neural networks have proven themselves as a good solution for such problems as object recognition.…”
Section: Introductionmentioning
confidence: 99%