2020
DOI: 10.1609/aaai.v34i07.6735
|View full text |Cite
|
Sign up to set email alerts
|

GTC: Guided Training of CTC towards Efficient and Accurate Scene Text Recognition

Abstract: Connectionist Temporal Classification (CTC) and attention mechanism are two main approaches used in recent scene text recognition works. Compared with attention-based methods, CTC decoder has a much shorter inference time, yet a lower accuracy. To design an efficient and effective model, we propose the guided training of CTC (GTC), where CTC model learns a better alignment and feature representations from a more powerful attentional guidance. With the benefit of guided training, CTC model achieves robust and a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
54
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 111 publications
(54 citation statements)
references
References 18 publications
0
54
0
Order By: Relevance
“…For example, Shi et al [30] proposed the CTC-based method, where the visual feature extracted by CNN was reshaped as a sequence and then modeled by RNN and CTC loss. Following this pipeline, several methods were developed with improved accuracy [8,9,33]. Rather than decoding by RNN, segmentationbased methods [17,19,40] directly performed pixel-level character segmentation and prediction.…”
Section: Semantic-free Methodsmentioning
confidence: 99%
“…For example, Shi et al [30] proposed the CTC-based method, where the visual feature extracted by CNN was reshaped as a sequence and then modeled by RNN and CTC loss. Following this pipeline, several methods were developed with improved accuracy [8,9,33]. Rather than decoding by RNN, segmentationbased methods [17,19,40] directly performed pixel-level character segmentation and prediction.…”
Section: Semantic-free Methodsmentioning
confidence: 99%
“…As can be seen from the results shown in Tab.4, DPAN achieves the best performance among all types of approaches. Note that GTC [8] uses additional text images for training. Even so, DPAN performs better than GTC on most datasets.…”
Section: Comparison With State-of-the-art Methodsmentioning
confidence: 99%
“…1.8% and 1.7% improvements are achieved on two irregular datasets IC15 and SVTP without any pre-processing like rectification. Compared with the CTC-based method GTC [23], our PIMNet outperforms it on all six benchmarks under the same setting of training data. Besides autoregressive guidance, our PIMNet also adopts an iterative easy first decoding strategy to extract context information and mimicking learning to improve the learning of the hidden layers, which is a further step.…”
Section: Comparisons With State-of-the-artsmentioning
confidence: 98%
“…1, PIMNet with mimicking learning achieves better accuracy, especially 1.1% on SVT and 1.1% on IC15. Note that the PIMNet without mimicking still retains the autoregressive decoder, which is similar to GTC [23]. Each pixel shows the cosine similarities 𝑐𝑜𝑠 𝑖 𝑗 of the i-th and j-th outputs.…”
Section: Ablation Studiesmentioning
confidence: 99%