Accurate detection of multi-oriented text that accounts for a large proportion in real practice is of great significance. The performance has improved rapidly on common benchmarks in recent years. However, dense long text case and the quality of detection are easy to be overlooked. Direct regression may produce low-quality and incomplete detections due to the constrain of the receptive field; proposal-based methods could alleviate this but might introduce redundant context due to RoI operation, degrading the performance. To address the dilemma, a novel proposed corner-aware convolution in which the sampling positions tightly cover the text area is utilized to encode an initial corner prediction into the feature maps, which can be further used to produce a refined corner prediction. We embed the proposed module into an anchor-free baseline model, leading to a simple and effective fully convolutional corner refinement network (FC 2 RN). Experimental results on four public datasets including MSRA-TD500, ICDAR2015, RCTW-17, and COCO-Text demonstrate that FC 2 RN can outperform state-of-the-art methods.
No abstract
Due to the large success in object detection and instance segmentation, Mask R-CNN attracts great attention and is widely adopted as a strong baseline for arbitrary-shaped scene text detection and spotting. However, two issues remain to be settled. The first is dense text case, which is easy to be neglected but quite practical. There may exist multiple instances in one proposal, which makes it difficult for the mask head to distinguish different instances and degrades the performance. In this work, we argue that the performance degradation results from the learning confusion issue in the mask head. We propose to use an MLP decoder instead of the "deconv-conv" decoder in the mask head, which alleviates the issue and promotes robustness significantly. And we propose instanceaware mask learning in which the mask head learns to predict the shape of the whole instance rather than classify each pixel to text or non-text. With instance-aware mask learning, the mask branch can learn separated and compact masks. The second is that due to large variations in scale and aspect ratio, RPN needs complicated anchor settings, making it hard to maintain and transfer across different datasets. To settle this issue, we propose an adaptive label assignment in which all instances especially those with extreme aspect ratios are guaranteed to be associated with enough anchors. Equipped with these components, the proposed method named MAYOR 1 achieves state-of-the-art performance on five benchmarks including DAST1500, MSRA-TD500, ICDAR2015, CTW1500, and Total-Text.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.