Curved Text Detection in Natural Scene Images with Semi- and Weakly-Supervised Learning

Qin, Xugong; Zhou, Yu; Yang, Dongbao; Wang, Weiping

doi:10.1109/icdar.2019.00095

Cited by 38 publications

(12 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Different convolutional deep learning neural network based methods have recently been used as feature backbone to extract features in order to appropriately handle the text of different scales (Gao et al, 2019; S. Qin, Bissacco, et al, 2019). Features have been extracted by using the output of one or more of the hidden layers in CNN (Gao et al, 2019; X. Qin, Zhou, et al, 2019). Sharing features extracted from CNN has also been used to extend a character classification method to character detection and bigram classification.…”

Section: Spotting ‐Based Mining Approachesmentioning

confidence: 99%

Mining text from natural scene and video images: A survey

Shivakumara

Alaei

Pal

2021

WIREs Data Min & Knowl

View full text Add to dashboard Cite

In computer terminology, mining is considered as extracting meaningful information or knowledge from a large amount of data/information using computers. The meaningful information can be extracted from normal text, and images obtained from different resources, such as natural scene images, video, and documents by deriving semantics from text and content of the images. Although there are many pieces of work on text/data mining and several survey/review papers are published in the literature, to the best of our knowledge there is no survey paper on mining textual information from the natural scene, video, and document images considering word spotting techniques. In this article, we, therefore, provide a comprehensive review of both the non‐spotting and spotting based mining techniques. The mining approaches are categorized as feature, learning and hybrid‐based methods to analyze the strengths and limitations of the models of each category. In addition, it also discusses the usefulness of the methods according to different situations and applications. Furthermore, based on the review of different mining approaches, this article identifies the limitations of the existing methods and suggests new applications and future directions to continue the research in multiple directions. We believe such a review article will be useful to the researchers to quickly become familiar with the state‐of‐the‐art information and progresses made toward mining textual information from natural scene and video images. This article is categorized under: Algorithmic Development > Text Mining

show abstract

Section: Spotting ‐Based Mining Approachesmentioning

confidence: 99%

Mining text from natural scene and video images: A survey

Shivakumara

Alaei

Pal

2021

WIREs Data Min & Knowl

View full text Add to dashboard Cite

show abstract

“…In regression-based methods, geometry of text is directly predicted from convolutional features [2, 11, 12, 17, 19, 22, 23, 42, 50-52, 56, 77] or RoI features [25,37,55,72], and then used to decode to produce the predicted results based on given reference points or boxes. In instance segmentation based methods, typically, Mask R-CNN based methods [28,33,43,57,59,60], an extra branch is added to a detection framework. The results are achieved via instance segmentation, getting rids of learning target confusion problem [26,61] which exists in regression-based methods.…”

Section: Related Workmentioning

confidence: 99%

Mask is All You Need: Rethinking Mask R-CNN for Dense and Arbitrary-Shaped Scene Text Detection

Qin¹,

Zhou²,

Guo³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Due to the large success in object detection and instance segmentation, Mask R-CNN attracts great attention and is widely adopted as a strong baseline for arbitrary-shaped scene text detection and spotting. However, two issues remain to be settled. The first is dense text case, which is easy to be neglected but quite practical. There may exist multiple instances in one proposal, which makes it difficult for the mask head to distinguish different instances and degrades the performance. In this work, we argue that the performance degradation results from the learning confusion issue in the mask head. We propose to use an MLP decoder instead of the "deconv-conv" decoder in the mask head, which alleviates the issue and promotes robustness significantly. And we propose instanceaware mask learning in which the mask head learns to predict the shape of the whole instance rather than classify each pixel to text or non-text. With instance-aware mask learning, the mask branch can learn separated and compact masks. The second is that due to large variations in scale and aspect ratio, RPN needs complicated anchor settings, making it hard to maintain and transfer across different datasets. To settle this issue, we propose an adaptive label assignment in which all instances especially those with extreme aspect ratios are guaranteed to be associated with enough anchors. Equipped with these components, the proposed method named MAYOR 1 achieves state-of-the-art performance on five benchmarks including DAST1500, MSRA-TD500, ICDAR2015, CTW1500, and Total-Text.

show abstract

“…Mask TextSpotter [11] is the first end-to-end trainable arbitraryshaped scene text spotter with a detection module based on Mask R-CNN. Qin et al [15] reduce the requirement of pixel-level annotations with weakly-supervised learning. Chen et al [2] propose a self-training framework with unannotated videos based on Mask R-CNN.…”

Section: Related Workmentioning

confidence: 99%

Which and Where to Focus: A Simple yet Accurate Framework for Arbitrary-Shaped Nearby Text Detection in Scene Images

Guo

Zhou

Qin

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Scene text detection has drawn the close attention of researchers. Though many methods have been proposed for horizontal and oriented texts, previous methods may not perform well when dealing with arbitrary-shaped texts such as curved texts. In particular, confusion problem arises in the case of nearby text instances. In this paper, we propose a simple yet effective method for accurate arbitrary-shaped nearby scene text detection. Firstly, a One-to-Many Training Scheme (OMTS) is designed to eliminate confusion and enable the proposals to learn more appropriate groundtruths in the case of nearby text instances. Secondly, we propose a Proposal Feature Attention Module (PFAM) to exploit more effective features for each proposal, which can better adapt to arbitrary-shaped text instances. Finally, we propose a baseline that is based on Faster R-CNN and outputs the curve representation directly. Equipped with PFAM and OMTS, the detector can achieve state-of-theart or competitive performance on several challenging benchmarks.

show abstract

Curved Text Detection in Natural Scene Images with Semi- and Weakly-Supervised Learning

Cited by 38 publications

References 32 publications

Mining text from natural scene and video images: A survey

Mining text from natural scene and video images: A survey

Mask is All You Need: Rethinking Mask R-CNN for Dense and Arbitrary-Shaped Scene Text Detection

Which and Where to Focus: A Simple yet Accurate Framework for Arbitrary-Shaped Nearby Text Detection in Scene Images

Contact Info

Product

Resources

About