2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.00313
|View full text |Cite
|
Sign up to set email alerts
|

What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels

Abstract: Scene text recognition (STR) task has a common practice: All state-of-the-art STR models are trained on large synthetic data. In contrast to this practice, training STR models only on fewer real labels (STR with fewer labels) is important when we have to train STR models without synthetic data: for handwritten or artistic texts that are difficult to generate synthetically and for languages other than English for which we do not always have synthetic data. However, there has been implicit common knowledge that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
35
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 84 publications
(35 citation statements)
references
References 50 publications
0
35
0
Order By: Relevance
“…The existing methods [3,4] do not recognize the characters correctly, while the proposed method reports correct recognition results. In addition, We have calculated the average FPS of the proposed and existing methods [3,4] for all the 8 datasets and the results are 6.76, 6.68 and 7.58 for the methods [3], [4] and SGBANet, respectively. This shows that our method is faster than the existing methods.…”
Section: Comparison With State-of-the-art Approachesmentioning
confidence: 86%
See 1 more Smart Citation
“…The existing methods [3,4] do not recognize the characters correctly, while the proposed method reports correct recognition results. In addition, We have calculated the average FPS of the proposed and existing methods [3,4] for all the 8 datasets and the results are 6.76, 6.68 and 7.58 for the methods [3], [4] and SGBANet, respectively. This shows that our method is faster than the existing methods.…”
Section: Comparison With State-of-the-art Approachesmentioning
confidence: 86%
“…It can be observed from Fig. 1 that the existing methods [4,11] do not recognize the characters correctly for arbitrarily shaped text and text with complex backgrounds. Therefore, designing a robust method for recognizing arbitrarily shaped text is still a challenging task that remains to be solved.…”
Section: Introductionmentioning
confidence: 97%
“…As the thriving of deep learning, the researchers also made attempts to build the text recognition models based on deep neural networks following the bottom-up fashion [2,3,5,6,22,23,29,35,38,[40][41][42][43]46,47,51,59,60,65,67,71,73,83,87,90,98,100]. For example, CRNN [59] utilizes the CNN-RNN architecture to extract features for the text images, which are further supervised with the CTC loss [24] to maximize the probability of the ground truth.…”
Section: Existing Text Recognition Methodsmentioning
confidence: 99%
“…In this work, we adopt STR public datasets to evaluate the performance of the pre-trained model. The datasets cover A New Dataset: UTI-100M As suggested in literature (Baek, Matsui, and Aizawa 2021), training model on real data can yield better results than synthetic data. Therefore, we collect a large-scale real dataset containing about 100 million unlabeled text line images, named Unlabeled Text Image 100M (UTI-100M), to explore the potential of the proposed hierarchical contrastive learning paradigm.…”
Section: Datasets and Metricsmentioning
confidence: 99%