2021
DOI: 10.48550/arxiv.2111.12351
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Decoupling Visual-Semantic Feature Learning for Robust Scene Text Recognition

Abstract: Semantic information has been proved effective in scene text recognition. Most existing methods tend to couple both visual and semantic information in an attention-based decoder. As a result, the learning of semantic features is prone to have a bias on the limited vocabulary of the training set, which is called vocabulary reliance. In this paper, we propose a novel Visual-Semantic Decoupling Network (VSDN) to address the problem. Our VSDN contains a Visual Decoder (VD) and a Semantic Decoder (SD) to learn pure… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 27 publications
(32 reference statements)
0
1
0
Order By: Relevance
“…Therefore, some studies proposed using both representations simultaneously. Methods have been proposed for embedding them in the same Euclidean space [7] or estimating one feature from the other and fusing them together [16]. However, they utilize different representations simultaneously and directly, which may interfere with each other.…”
Section: Related Workmentioning
confidence: 99%
“…Therefore, some studies proposed using both representations simultaneously. Methods have been proposed for embedding them in the same Euclidean space [7] or estimating one feature from the other and fusing them together [16]. However, they utilize different representations simultaneously and directly, which may interfere with each other.…”
Section: Related Workmentioning
confidence: 99%