2022
DOI: 10.3390/electronics11223813
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Task Learning for Scene Text Image Super-Resolution with Multiple Transformers

Abstract: Scene text image super-resolution aims to improve readability by recovering text shapes from low-resolution degraded text images. Although recent developments in deep learning have greatly improved super-resolution (SR) techniques, recovering text images with irregular shapes, heavy noise, and blurriness is still challenging. This is because networks with Convolutional Neural Network (CNN)-based backbones cannot sufficiently capture the global long-range correlations of text images or detailed sequential infor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 42 publications
0
1
0
Order By: Relevance
“…Parallel contextual attention blocks are designed in PCAN(C. Zhao et al, 2021) to adaptively select key information in text sequences that contributes to image super-resolution; Chen et al use Transformer (Vaswani et al, 2017) to replace the BLSTM-based sequence processing block , proposed the STT network (Chen et al, 2021), whose coarse global attention computation increases the difficulty of learning and has limited improvement in accuracy; Quan et al proposed a cascade model (Quan et al, 2020) for collaborative recovery of blurred text images in both the high-frequency domain and the image domain; and the TPGSR method combines textual prior in the encoder and employs iteration to enhance low-resolution images. Believing that image reconstruction models tend to have more robust denoising capabilities and correct text structure information, Honda et al proposed the MTSR network (Honda et al, 2022), which innovatively uses a converter-based module to transfer complementary features from reconstruction models to SR models, significantly improving the accuracy of existing text recognizers. Guo et al proposed LEMMA (Guo et al, 2023), which explicitly models character regions to produce high level text-specific guidance for super-resolution.…”
Section: Scene Text Image Super Resolutionmentioning
confidence: 99%
“…Parallel contextual attention blocks are designed in PCAN(C. Zhao et al, 2021) to adaptively select key information in text sequences that contributes to image super-resolution; Chen et al use Transformer (Vaswani et al, 2017) to replace the BLSTM-based sequence processing block , proposed the STT network (Chen et al, 2021), whose coarse global attention computation increases the difficulty of learning and has limited improvement in accuracy; Quan et al proposed a cascade model (Quan et al, 2020) for collaborative recovery of blurred text images in both the high-frequency domain and the image domain; and the TPGSR method combines textual prior in the encoder and employs iteration to enhance low-resolution images. Believing that image reconstruction models tend to have more robust denoising capabilities and correct text structure information, Honda et al proposed the MTSR network (Honda et al, 2022), which innovatively uses a converter-based module to transfer complementary features from reconstruction models to SR models, significantly improving the accuracy of existing text recognizers. Guo et al proposed LEMMA (Guo et al, 2023), which explicitly models character regions to produce high level text-specific guidance for super-resolution.…”
Section: Scene Text Image Super Resolutionmentioning
confidence: 99%