2019
DOI: 10.1609/aaai.v33i01.33018610
|View full text |Cite
|
Sign up to set email alerts
|

Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition

Abstract: Recognizing irregular text in natural scene images is challenging due to the large variance in text appearance, such as curvature, orientation and distortion. Most existing approaches rely heavily on sophisticated model designs and/or extra fine-grained annotations, which, to some extent, increase the difficulty in algorithm implementation and data collection. In this work, we propose an easy-to-implement strong baseline for irregular scene text recognition, using offthe-shelf neural network components and onl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
314
0
1

Year Published

2019
2019
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 356 publications
(316 citation statements)
references
References 20 publications
1
314
0
1
Order By: Relevance
“…Performance on Curved Text: As for curved dataset, we outperforms previous state-of-the-art method using rectification [8] by an absolute improvement of 5% on CUTE. CAP-Net also achieves higher score than the 2D attention baseline [10] by 3.5% on CUTE, while surpassing by 7.2% on IC15 and 2.4% on SVT-P. The superior performance verifies the effectiveness of our method.…”
Section: Methodsmentioning
confidence: 68%
See 1 more Smart Citation
“…Performance on Curved Text: As for curved dataset, we outperforms previous state-of-the-art method using rectification [8] by an absolute improvement of 5% on CUTE. CAP-Net also achieves higher score than the 2D attention baseline [10] by 3.5% on CUTE, while surpassing by 7.2% on IC15 and 2.4% on SVT-P. The superior performance verifies the effectiveness of our method.…”
Section: Methodsmentioning
confidence: 68%
“…The fact that the polygon prediction is shape-sensitive and may not generalize well to unseen shapes limits the potential of rectification-based methods. Similar problem also exists in 2D attention method [10], which is proven by a less competent score on blurred datasets.…”
Section: Introductionmentioning
confidence: 76%
“…Su et al [34,36] converted text images into sequential signals via extracting their HOG features, and designed an ensembling technique to combine the outputs of two LSTM branches, so that better recognition performance could be achieved. Li et al [37] pointed out that traditional attention mechanism was not able to produce accurate attention predictions, thus the recognition performance on irregular text images was largely compromised. To address this issue, they designed a 2-D attention module, where one LSTM was used to encode feature maps column by column to produce holistic features, and another was employed as usual to generate final sequential outputs.…”
Section: Related Workmentioning
confidence: 99%
“…To validate the effectiveness of our method, we evaluate our PRN on several irregular benchmarks and summarize the results in Table 4 [1, 4-7, 9, 15-20, 30, 36-40]. Considering [41] used extra synthetic and real images for training, we did not compare the results with [41] to ensure fairness. As observed in Table 4, our method outperforms other approaches by a large margin on most benchmarks.…”
Section: Performance On Irregular Benchmarksmentioning
confidence: 99%