2019 IEEE Winter Conference on Applications of Computer Vision (WACV) 2019
DOI: 10.1109/wacv.2019.00086
|View full text |Cite
|
Sign up to set email alerts
|

Mask R-CNN With Pyramid Attention Network for Scene Text Detection

Abstract: In this paper, we present a new Mask R-CNN based text detection approach which can robustly detect multioriented and curved text from natural scene images in a unified manner. To enhance the feature representation ability of Mask R-CNN for text detection tasks, we propose to use the Pyramid Attention Network (PAN) as a new backbone network of Mask R-CNN. Experiments demonstrate that PAN can suppress false alarms caused by text-like backgrounds more effectively. Our proposed approach has achieved superior perfo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
54
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 92 publications
(54 citation statements)
references
References 40 publications
0
54
0
Order By: Relevance
“…Alternatively, attention mechanisms have been widely studied in deep CNNs for many computer vision tasks in order to efficiently integrate local and global features, including human pose estimation [12], emotion recognition [13], text detection [14], object detection [15] and classification [16]. Unlike standard multi-scale features fusion approaches, which compress an entire image into a static representation, attention allows the network to focus on the most relevant features without additional supervision, avoiding the use of multiple similar feature maps and highlighting salient features that are useful for a given task.…”
Section: Introductionmentioning
confidence: 99%
“…Alternatively, attention mechanisms have been widely studied in deep CNNs for many computer vision tasks in order to efficiently integrate local and global features, including human pose estimation [12], emotion recognition [13], text detection [14], object detection [15] and classification [16]. Unlike standard multi-scale features fusion approaches, which compress an entire image into a static representation, attention allows the network to focus on the most relevant features without additional supervision, avoiding the use of multiple similar feature maps and highlighting salient features that are useful for a given task.…”
Section: Introductionmentioning
confidence: 99%
“…TextContourNet [43] extract instance-level text contour to increase the accuracy of curve text detection. In [44], to enhance the feature representation ability, a pyramid attention network is used text detection tasks. Mask-Most Net [45] use instance-level mask approximation method through a combination of high-level semantic and low-level features.…”
Section: A Scene Text Detectionmentioning
confidence: 99%
“…SSTD [31] takes the pixel-wise binary mask of text images as an additional input, which is used to refine the original feature map. Huang et al incorporate the proposed Pyramid Attention Network (PAN) into the Mask R-CNN framework to enhance the text detection performance [32]. R 2 CNN ++ [33] adopts a multi-dimensional attention mechanism which contains both pixel-wise and channel-wise attention to reduce the adverse impact of noise.…”
Section: B Attention Mechanismmentioning
confidence: 99%