2019 IEEE/CVF International Conference on Computer Vision (ICCV) 2019
DOI: 10.1109/iccv.2019.00918
|View full text |Cite
|
Sign up to set email alerts
|

Chinese Street View Text: Large-Scale Chinese Text Reading With Partially Supervised Learning

Abstract: Most existing text reading benchmarks make it difficult to evaluate the performance of more advanced deep learning models in large vocabularies due to the limited amount of training data. To address this issue, we introduce a new large-scale text reading benchmark dataset named Chinese Street View Text (C-SVT) with 430, 000 street view images, which is at least 14 times as large as the existing Chinese text reading benchmarks. To recognize Chinese text in the wild while keeping large-scale datasets labeling co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 47 publications
(18 citation statements)
references
References 36 publications
0
18
0
Order By: Relevance
“…We generate a synthetic long text dataset with the engine in [9], which includes 3 million images. Besides, we also use the training set of RCTW [32] and LSVT [33] as training data. Following the configuration described in Sec.…”
Section: Results On Non-latin Long Textmentioning
confidence: 99%
“…We generate a synthetic long text dataset with the engine in [9], which includes 3 million images. Besides, we also use the training set of RCTW [32] and LSVT [33] as training data. Following the configuration described in Sec.…”
Section: Results On Non-latin Long Textmentioning
confidence: 99%
“…Second, we see potential in using a transformer in an end-to-end text spotting system. Third, TRIG can be improved to solve the problem about other languages [54] and hand-written (cursive) text recognition [55,56].…”
Section: Discussionmentioning
confidence: 99%
“…Most of these images were sourced from street boards and labeled as geometric shapes with varying vertices. A large-scale street view text (LSVT) of ICDAR 2019 [30] provides annotations to all 20,000 test images, 30,000 training images. ICDAR 2019 (MLT19) [31], this dataset comprises 18,000 images that have been annotated at the word level.…”
Section: Datasetsmentioning
confidence: 99%