Scene Text Recognition from Two-Dimensional Perspective

Liao, Minghui; Zhang, Jian; Wan, Zhaoyi; Xie, Fengming; Liang, Jiajun; Lyu, Pengyuan; Yao, Cong; Bai, Xiang

doi:10.1609/aaai.v33i01.33018714

Cited by 234 publications

(176 citation statements)

References 11 publications

Supporting

Mentioning

175

Contrasting

Order By: Relevance

“…Different from those LSTM-based approaches, recognizers without LSTM can better leverage the spatial information, but they also unavoidably introduce additional parameters or post processing steps in order to produce sequential outputs, such as the multiple classifiers used by STN-OCR [9] and the word formation module designed in [10].…”

Section: Related Workmentioning

confidence: 99%

“…Backbone: Similar to Liao's work [10], we take VGG-16 as the encoder of our feature extraction module, and remove the fully connected layers and pooling layers from the last two encoding stages. We also assemble two deformable convolutional layers [24] at stage-4 and stage-5 of the decoder given their flexible receptive fields.…”

Section: Cnn-based Feature Extractionmentioning

confidence: 99%

“…We also assemble two deformable convolutional layers [24] at stage-4 and stage-5 of the decoder given their flexible receptive fields. However, compared with Liao's network [10], the resolution of final feature maps is restored to a smaller size of W 4 × H 4 × C in our FACLSTM, instead of the W 2 × H 2 × C used in [10], considering the memory and computation cost. Here, W , H and C denote the width, height and channels of feature maps, respectively.…”

Section: Cnn-based Feature Extractionmentioning

confidence: 99%

See 2 more Smart Citations

FACLSTM: ConvLSTM with focused attention for scene text recognition

Wang¹,

Jia²,

He³

et al. 2020

Sci. China Inf. Sci.

View full text Add to dashboard Cite

Scene text recognition has recently been widely treated as a sequence-to-sequence prediction problem, where traditional fully-connected-LSTM (FC-LSTM) has played a critical role. Due to the limitation of FC-LSTM, existing methods have to convert 2-D feature maps into 1-D sequential feature vectors, resulting in severe damages of the valuable spatial and structural information of text images. In this paper, we argue that scene text recognition is essentially a spatiotemporal prediction problem for its 2-D image inputs, and propose a convolution LSTM (ConvLSTM)-based scene text recognizer, namely, FACLSTM, i.e., Focused Attention ConvLSTM, where the spatial correlation of pixels is fully leveraged when performing sequential prediction with LSTM. Particularly, the attention mechanism is properly incorporated into an efficient ConvLSTM structure via the convolutional operations and additional character center masks are generated to help focus attention on right feature areas. The experimental results on benchmark datasets IIIT5K, SVT and CUTE demonstrate that our proposed FACLSTM performs competitively on the regular, low-resolution and noisy text images, and outperforms the state-of-the-art approaches on the curved text images with large margins.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Cnn-based Feature Extractionmentioning

confidence: 99%

Section: Cnn-based Feature Extractionmentioning

confidence: 99%

See 1 more Smart Citation

FACLSTM: ConvLSTM with focused attention for scene text recognition

Wang¹,

Jia²,

He³

et al. 2020

Sci. China Inf. Sci.

View full text Add to dashboard Cite

show abstract

“…Recently, CA-FCN [11] takes the two-dimensional spatial distribution of text into consideration, and text recognition is reformulated as semantic segmentation, where character categories are segmented from the background. However, their method abandons the use of recurrent neural networks (RNN), and thus fails to obtain an overall vision.…”

Section: Introductionmentioning

confidence: 99%

A New Perspective for Flexible Feature Gathering in Scene Text Recognition Via Character Anchor Pooling

Guan

Bian

Yao

2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

Irregular scene text recognition has attracted much attention from the research community, mainly due to the complexity of shapes of text in natural scene. However, recent methods either rely on shape-sensitive modules such as bounding box regression, or discard sequence learning. To tackle these issues, we propose a pair of coupling modules, termed as Character Anchoring Module (CAM) and Anchor Pooling Module (APM), to extract high-level semantics from twodimensional space to form feature sequences. The proposed CAM localizes the text in a shape-insensitive way by design by anchoring characters individually. APM then interpolates and gathers features flexibly along the character anchors which enables sequence learning. The complementary modules realize a harmonic unification of spatial information and sequence learning. With the proposed modules, our recognition system surpasses previous state-of-the-art scores on irregular and perspective text datasets, including, ICDAR 2015, CUTE, and Total-Text, while paralleling state-of-theart performance on regular text datasets.

show abstract

“…Over the years, optical character recognition has been a popular research topic for computer vision specialists [1][2][3][4][5][6]. Convolutional neural networks have proven themselves as a good solution for such problems as object recognition.…”

Section: Introductionmentioning

confidence: 99%

Recognition of images of Korean characters using embedded networks

Ilyuhin

Sheshkus

Arlazarov

2020

Twelfth International Conference on Machine Vision (ICMV 2019)

View full text Add to dashboard Cite

Despite the significant success in the field of text recognition, complex and unsolved problems still exist in this field. In recent years, the recognition accuracy of the English language has greatly increased, while the problem of recognition of hieroglyphs has received much less attention. Hieroglyph recognition or image recognition with Korean, Japanese or Chinese characters have differences from the traditional text recognition task.This article discusses the main differences between hieroglyph languages and the Latin alphabet in the context of image recognition. A light-weight method for recognizing images of the hieroglyphs is proposed and tested on a public dataset of Korean hieroglyph images. Despite the existing solutions, the proposed method is suitable for mobile devices. Its recognition accuracy is better than the accuracy of the open-source OCR framework. The presented method of training embedded net bases on the similarities in the recognition data.

show abstract

Scene Text Recognition from Two-Dimensional Perspective

Cited by 234 publications

References 11 publications

FACLSTM: ConvLSTM with focused attention for scene text recognition

FACLSTM: ConvLSTM with focused attention for scene text recognition

A New Perspective for Flexible Feature Gathering in Scene Text Recognition Via Character Anchor Pooling

Recognition of images of Korean characters using embedded networks

Contact Info

Product

Resources

About