2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00035
|View full text |Cite
|
Sign up to set email alerts
|

Primitive Representation Learning for Scene Text Recognition

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 71 publications
(24 citation statements)
references
References 35 publications
0
24
0
Order By: Relevance
“…Comparison with State-of-the-Art We compare the proposed S-GTR with state-of-the-art methods, and the results are summarized in Table 1, where the inference speed as well as the number of model parameters are also reported. As can be seen, the proposed S-GTR achieves the highest recognition accuracy and 3× faster inference speed compared with the second best method PREN2D (Yan et al 2021). In addition, when real data is utilized for training, S- GTR achieves more impressive results on all the six benchmarks, validating the effectiveness of the proposed GTR for textual reasoning and the benefit of real data.…”
Section: Performance Analysismentioning
confidence: 59%
See 1 more Smart Citation
“…Comparison with State-of-the-Art We compare the proposed S-GTR with state-of-the-art methods, and the results are summarized in Table 1, where the inference speed as well as the number of model parameters are also reported. As can be seen, the proposed S-GTR achieves the highest recognition accuracy and 3× faster inference speed compared with the second best method PREN2D (Yan et al 2021). In addition, when real data is utilized for training, S- GTR achieves more impressive results on all the six benchmarks, validating the effectiveness of the proposed GTR for textual reasoning and the benefit of real data.…”
Section: Performance Analysismentioning
confidence: 59%
“…To further verify the effectiveness of GTR, we plug our GTR module into four representative types of STR methods, including CTCbased method (e.g., CRNN (Shi, Bai, and Yao 2016)), 1D attention-based method (e.g., TRBA (Baek et al 2019)), 2D attention-based method (e.g., Base2D (Yan et al 2021)), and transformer-based methods (e.g., SRN (Yu et al 2020) and ABINet-LV (Fang et al 2021)). For the 1D attention-based method, the prediction result of VR is a 1D semantic vector.…”
Section: Plugging Gtr In Different Modelsmentioning
confidence: 99%
“…Specifically, our model achieves superior performance improvements on SVT, IC13 L , IC15 S and SVTP (datasets contain low-quality images) by 1.1%∼1.7%. PREN2D [29] slightly wins on CUTE, but IterNet shows huge performance gains on all the other datasets: 1.2% on IIIT, 1.1% on SVT, 1.5% on IC13 S , 4.7% on IC15 S , and 3.3% on SVTP. It's worth noting that our IterNet uses the same iterative language modeling module as ABINet, but with a different vision modeling module (i.e., IterVM).…”
Section: Comparison To State-of-the-artsmentioning
confidence: 98%
“…VisionLAN [28] proposes language-aware visual masks for training, which simulates the case of missing character-wise visual semantics and guides the vision modeling module to use not only the visual texture of characters but also the linguistic information in visual context for recognition. PREN2D [29] proposes global feature aggregations to learn primitive visual representations from multi-scale feature maps and exploits GCNs to transform primitive representations into high-level visual text representations. Different from these works, our IterVM uses feedback connections to fuse high-level (the most semantic) visual feature with multi-level visual features.…”
Section: Visual Feature Enhancement By Semantic Informationmentioning
confidence: 99%
“…Specifically, it models the sliced visual features as the graph nodes, captures their dependency, and merges features of the same instance for prediction. PREN2D (Yan et al 2021) adopts a meta-learning framework to extract visual representations via GCN. In this paper, we devise a two-level graph network based on GCN to perform spatial context reasoning within and between character instances to refine the visual recognition results.…”
Section: Related Workmentioning
confidence: 99%