MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

He, Minghang; Liao, Minghui; Yang, Zhibo; Zhong, Humen; Tang, Jun; Cheng, Wenqing; Yao, Cong; Wang, Yongpan; Bai, Xiang

doi:10.1109/cvpr46437.2021.00870

Cited by 85 publications

(22 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…RRPN [13] adopts rotated anchors and RRoI pooling for detecting multi-oriented texts. Different from these anchor-based methods, anchor-free methods (e.g., EAST [16], MOST [19], and DDR [20]) directly regress the offsets from boundaries or vertexes to the current point for detecting texts. LOMO [12] introduces an iterative refinement module to iteratively refine the text localization of a direct regression based on bounding box proposals.…”

Section: A Regression-based Methodsmentioning

confidence: 99%

Arbitrary Shape Text Detection via Boundary Transformer

Zhang¹,

Zhu²,

Yuan³

et al. 2022

Preprint

View full text Add to dashboard Cite

Arbitrary shape text detection is a challenging task due to its complexity and variety, e.g., various scales, random rotations, and curve shapes. In this paper, we propose an arbitrary shape text detector with a boundary transformer, which can accurately and directly locate text boundaries without any post-processing. Our method mainly consists of a boundary proposal module and an iteratively optimized boundary transformer module. The boundary proposal module consisting of multi-layer dilated convolutions will compute important prior information (including classification map, distance field, and direction field) for generating coarse boundary proposals meanwhile guiding the optimization of boundary transformer. The boundary transformer module adopts an encoder-decoder structure, in which the encoder is constructed by multi-layer transformer blocks with residual connection while the decoder is a simple multilayer perceptron network (MLP). Under the guidance of prior information, the boundary transformer module will gradually refine the coarse boundary proposals via boundary deformation in an iterative manner. Furthermore, we propose a novel boundary energy loss (BEL) which introduces an energy minimization constraint and an energy monotonically decreasing constraint for every boundary optimization step. Extensive experiments on publicly available and challenging datasets demonstrate the stateof-the-art performance and promising efficiency of our method.

show abstract

Section: A Regression-based Methodsmentioning

confidence: 99%

Arbitrary Shape Text Detection via Boundary Transformer

Zhang¹,

Zhu²,

Yuan³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…In multi-oriented scene text detection, regressionbased methods are popular, including anchor-based methods [28,39] and anchor-free methods [15,16,20,74]. They usually directly predict entire texts using a rotated bounding box or quadrangle.…”

Section: Multi-oriented Object Detectionmentioning

confidence: 99%

“…EAST [74] and DDR [16] perform rotated bounding box regression or vertex regression at each location. MOST [15] puts forward a set of strategies to improve the quality of text localization for long text significantly.…”

Section: Multi-oriented Object Detectionmentioning

confidence: 99%

Graph fusion network for multi-oriented object detection

et al. 2022

View full text Add to dashboard Cite

In object detection, non-maximum suppression (NMS) methods are extensively adopted to remove horizontal duplicates of detected dense boxes for generating final object instances. However, due to the degraded quality of dense detection boxes and not explicit exploration of the context information, existing NMS methods via simple intersection-over-union (IoU) metrics tend to underperform on multi-oriented and long-size objects detection. Distinguishing with general NMS methods via duplicate removal, we propose a novel graph fusion network, named GFNet, for multi-oriented object detection. Our GFNet is extensible and adaptively fuse dense detection boxes to detect more accurate and holistic multi-oriented object instances. Specifically, we first adopt a locality-aware clustering algorithm to group dense detection boxes into different clusters. We will construct an instance sub-graph for the detection boxes belonging to one cluster. Then, we propose a graph-based fusion network via Graph Convolutional Network (GCN) to learn to reason and fuse the detection boxes for generating final instance boxes. Extensive experiments both on public available multi-oriented text datasets (including MSRA-TD500, ICDAR2015, ICDAR2017-MLT) and multi-oriented object datasets (DOTA) verify the effectiveness and robustness of our method against general NMS methods in multi-oriented object detection.

show abstract

“…While state-ofthe-art text detection systems such as [44,61] excel at localizing individual text entities, visual text understanding [2] requires comprehension of the semantic and geometric layout [5,7] of the textual content. In the current literature, most works focus on the individual tasks of text entities detection [3,18,61] and layout analysis [26,58] in a separate way, devoting all the power of deep learning models to task-specific performance. We argue that joint treatment of these two closely related problems can result not only in simpler and more efficient models, but also models that are more accurate across all tasks.…”

Section: Introductionmentioning

confidence: 99%

“…The division between text detection and geometric layout analysis tasks has led to parallel and separate research directions. Text detectors [14,18,40,61] usually treat word-level annotations, i.e. sequence of characters not interrupted by Figure 1.…”

Section: Introductionmentioning

confidence: 99%

Towards End-to-End Unified Scene Text Detection and Layout Analysis

Qin¹,

Panteleev²,

Bissacco³

et al. 2022

Preprint

View full text Add to dashboard Cite

Scene text detection and document layout analysis have long been treated as two separate tasks in different image domains. In this paper, we bring them together and introduce the task of unified scene text detection and layout analysis. The first hierarchical scene text dataset is introduced to enable this novel research task. We also propose a novel method that is able to simultaneously detect scene text and form text clusters in a unified way. Comprehensive experiments show that our unified model achieves better performance than multiple well-designed baseline methods. Additionally, this model achieves stateof-the-art results on multiple scene text detection datasets without the need of complex post-processing. Dataset and code: https://github.com/google-researchdatasets/hiertext.

show abstract

MOST: A Multi-Oriented Scene Text Detector with Localization Refinement

Cited by 85 publications

References 32 publications

Arbitrary Shape Text Detection via Boundary Transformer

Arbitrary Shape Text Detection via Boundary Transformer

Graph fusion network for multi-oriented object detection

Towards End-to-End Unified Scene Text Detection and Layout Analysis

Contact Info

Product

Resources

About