2019 International Conference on Document Analysis and Recognition (ICDAR) 2019
DOI: 10.1109/icdar.2019.00130
|View full text |Cite
|
Sign up to set email alerts
|

NRTR: A No-Recurrence Sequence-to-Sequence Model for Scene Text Recognition

Abstract: Scene text recognition has attracted a great many researches for decades due to its importance to various applications. Existing sequence-to-sequence (seq2seq) recognizers mainly adopt Connectionist Temporal Classification (CTC) or Attention based Recurrent or Convolutional networks, and have made great progresses in scene text recognition. However, we observe that current methods suffer from slow training speed because the internal recurrence of RNNs limits training parallelization, and high complexity becaus… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
70
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 139 publications
(70 citation statements)
references
References 25 publications
0
70
0
Order By: Relevance
“…(b) a transformer can be used to translate from one language to another. So SATRN [30] and NRTR [31] adopt the encoder-decoder of the transformer to address the cross-modality between the image input and text output. The image input represents features extracted by shallow CNN.…”
Section: Transformer-based Methodsmentioning
confidence: 99%
“…(b) a transformer can be used to translate from one language to another. So SATRN [30] and NRTR [31] adopt the encoder-decoder of the transformer to address the cross-modality between the image input and text output. The image input represents features extracted by shallow CNN.…”
Section: Transformer-based Methodsmentioning
confidence: 99%
“…They explored several feature extraction and sequence labelling architectures. Sheng et al [36] used a stacked self-attention sequence-to-sequence encoder and decoder model. Further, they implemented a modalitytransform method that effectively transformed 2D natural scene image features into the 1D feature sequences.…”
Section: Related Workmentioning
confidence: 99%
“…The authors in [61] utilize text shape descriptors, such as center line, scale, and orientation to deal with highly curved or distorted text. NRTR [62] dispenses with recurrences and convolutions with a stacked self-attention module, where an encoder extracts features and a decoder perform the recognization of texts based on the output of the encoder. In [63], a realization of asynchronous training and inference behavior is performed to classify images irrespective of the presence of text instances, which leads to multimodal recognition tasks.…”
Section: B Scene Text Recognitionmentioning
confidence: 99%