2022
DOI: 10.1007/978-3-031-19815-1_26
|View full text |Cite
|
Sign up to set email alerts
|

Multi-modal Text Recognition Networks: Interactive Enhancements Between Visual and Semantic Features

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 32 publications
(14 citation statements)
references
References 34 publications
0
12
0
Order By: Relevance
“…Generally, language-augmented methods (i.e. , SRN [8], Vision-LAN [55], PARSeq [57], ABINet++ [104], LevOCR [56], MA-TRN [10] and MGP-STR) perform better than language-free methods, showing the significance of linguistic information. PTIE [51], which utilizes a transformer-only model with multiple patch resolutions, also achieves good results.…”
Section: Results On Standard Benchmarksmentioning
confidence: 99%
See 4 more Smart Citations
“…Generally, language-augmented methods (i.e. , SRN [8], Vision-LAN [55], PARSeq [57], ABINet++ [104], LevOCR [56], MA-TRN [10] and MGP-STR) perform better than language-free methods, showing the significance of linguistic information. PTIE [51], which utilizes a transformer-only model with multiple patch resolutions, also achieves good results.…”
Section: Results On Standard Benchmarksmentioning
confidence: 99%
“…We conduct qualitative comparisons with two representative STR methods (i.e. ABINet [9] and MATRN [10]) on typical images from standard benchmarks. Fig.…”
Section: Qualitative Resultsmentioning
confidence: 99%
See 3 more Smart Citations