Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition

Cheng, Changxu; Li, Bohan; Zheng, Qi; Wang, Yongpan; Liu, Wenyu

doi:10.48550/arxiv.2111.12351

Cited by 1 publication

(1 citation statement)

References 27 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore, some studies proposed using both representations simultaneously. Methods have been proposed for embedding them in the same Euclidean space [7] or estimating one feature from the other and fusing them together [16]. However, they utilize different representations simultaneously and directly, which may interfere with each other.…”

Section: Related Workmentioning

confidence: 99%

Temporally-aware Convolutional Block Attention Module for Video Text Detection

Fujitake

Ge²

2021

2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

View full text Add to dashboard Cite

Scene-text spotting is a task that predicts a text area on natural scene images and recognizes its text characters simultaneously. It has attracted much attention in recent years due to its wide applications. Existing research has mainly focused on improving text region detection, not text recognition. Thus, while detection accuracy is improved, the end-to-end accuracy is insufficient. Texts in natural scene images tend to not be a random string of characters but a meaningful string of characters, a word. Therefore, we propose adversarial learning of semantic representations for scene text spotting (A3S) to improve end-to-end accuracy, including text recognition. A3S simultaneously predicts semantic features in the detected text area instead of only performing text recognition based on existing visual features. Experimental results on publicly available datasets show that the proposed method achieves better accuracy than other methods.

show abstract

Section: Related Workmentioning

confidence: 99%