2017 IEEE International Conference on Computer Vision (ICCV) 2017
DOI: 10.1109/iccv.2017.331
|View full text |Cite
|
Sign up to set email alerts
|

Single Shot Text Detector with Regional Attention

Abstract: We present a novel single-shot text detector that directly outputs word-level bounding boxes in a natural image. We propose an attention mechanism which roughly identifies text regions via an automatically learned attentional map. This substantially suppresses background interference in the convolutional features, which is the key to producing accurate inference of words, particularly at extremely small sizes. This results in a single model that essentially works in a coarse-to-fine manner. It departs from rec… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
142
0
1

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
3

Relationship

1
9

Authors

Journals

citations
Cited by 306 publications
(144 citation statements)
references
References 33 publications
1
142
0
1
Order By: Relevance
“…In [41], EAST was introduced by exploring IOU loss [39] to detect multioriented text instances (e.g., words), with impressive results achieved. Recently, a single-shot text detector (SSTD) [9] was proposed by extending SSD object detector [22] to text detection. SSTD encodes text regional attention into convolutional features to enhance text information.…”
Section: Related Workmentioning
confidence: 99%
“…In [41], EAST was introduced by exploring IOU loss [39] to detect multioriented text instances (e.g., words), with impressive results achieved. Recently, a single-shot text detector (SSTD) [9] was proposed by extending SSD object detector [22] to text detection. SSTD encodes text regional attention into convolutional features to enhance text information.…”
Section: Related Workmentioning
confidence: 99%
“…Scene text detection has attracted much attention in the computer vision field because of its numerous applications, such as instant translation, image retrieval, scene parsing, geo-location, and blind-navigation. Recently, scene text detectors based on deep learning have shown promising performance [8,40,21,4,11,10,12,13,17,24,25,32,26]. These methods mainly train their networks to localize wordlevel bounding boxes.…”
Section: Introductionmentioning
confidence: 99%
“…To evaluate the adaptability of our learned anchor, we still set the input size to 800 pixels. As shown in Table.4, our method achieves an F-measure of 0.833, which also surpasses all of the anchorbased methods [22,32,9,24,16], including one-stage and two-stage frameworks. Compared with other approaches [38,21] which utilize a deep regression network that directly predict text region, our method still keep an absolute lead in running speed.…”
Section: Comparison To State Of the Artmentioning
confidence: 76%