Review on Text Detection Methods on Scene Images

Brisinello, Matteo; Grbić, Ratko; Vranješ, Mario; Vranješ, Denis

doi:10.1109/elmar.2019.8918680

Cited by 6 publications

(2 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Scene text detection and recognition have been an active research topic in computer vision over the past few decades. Comprehensive surveys and detailed analyses have been conducted [ 27 , 28 , 29 ]. Traditional natural scene text detection methods rely heavily on handcrafted features to distinguish between text and non-text components in natural scene images, including methods employing sliding window (SW) and connected component (CC) techniques [ 1 , 2 , 3 , 4 ].…”

Section: Related Workmentioning

confidence: 99%

R-YOLO: A Real-Time Text Detector for Natural Scenes with Arbitrary Rotation

Wang

Zheng

Zhang

et al. 2021

Sensors

View full text Add to dashboard Cite

Accurate and efficient text detection in natural scenes is a fundamental yet challenging task in computer vision, especially when dealing with arbitrarily-oriented texts. Most contemporary text detection methods are designed to identify horizontal or approximately horizontal text, which cannot satisfy practical detection requirements for various real-world images such as image streams or videos. To address this lacuna, we propose a novel method called Rotational You Only Look Once (R-YOLO), a robust real-time convolutional neural network (CNN) model to detect arbitrarily-oriented texts in natural image scenes. First, a rotated anchor box with angle information is used as the text bounding box over various orientations. Second, features of various scales are extracted from the input image to determine the probability, confidence, and inclined bounding boxes of the text. Finally, Rotational Distance Intersection over Union Non-Maximum Suppression is used to eliminate redundancy and acquire detection results with the highest accuracy. Experiments on benchmark comparison are conducted upon four popular datasets, i.e., ICDAR2015, ICDAR2013, MSRA-TD500, and ICDAR2017-MLT. The results indicate that the proposed R-YOLO method significantly outperforms state-of-the-art methods in terms of detection efficiency while maintaining high accuracy; for example, the proposed R-YOLO method achieves an F-measure of 82.3% at 62.5 fps with 720 p resolution on the ICDAR2015 dataset.

show abstract

Section: Related Workmentioning

confidence: 99%

R-YOLO: A Real-Time Text Detector for Natural Scenes with Arbitrary Rotation

Wang

Zheng

Zhang

et al. 2021

Sensors

View full text Add to dashboard Cite

show abstract

“…The survey paper can also provide readers with a clear idea of what has been done in the past and further show them clear directions and new applications for the future researcher. It is worth noting that there are good survey papers, for example, Dadiya and Goswami (2019); Pooja and Dhir (2016); Sharma et al (2012); Ye and Doermann (2015), Brisinello et al (2019); and Yin et al (2016), which include old models. Several methods have been proposed in 2019, 2020, and 2021 for addressing different issues of text spotting but there is no survey paper to provide a summary of the recent research papers (Cheikhrouhou et al, 2021; Khalil et al, 2021; Li et al, 2021; Mokayed et al, 2021).…”

Section: Motivation For Text Mining In Natural Scene and Video Imagesmentioning

confidence: 99%

Mining text from natural scene and video images: A survey

Shivakumara

Alaei

Pal

2021

WIREs Data Min & Knowl

View full text Add to dashboard Cite

In computer terminology, mining is considered as extracting meaningful information or knowledge from a large amount of data/information using computers. The meaningful information can be extracted from normal text, and images obtained from different resources, such as natural scene images, video, and documents by deriving semantics from text and content of the images. Although there are many pieces of work on text/data mining and several survey/review papers are published in the literature, to the best of our knowledge there is no survey paper on mining textual information from the natural scene, video, and document images considering word spotting techniques. In this article, we, therefore, provide a comprehensive review of both the non‐spotting and spotting based mining techniques. The mining approaches are categorized as feature, learning and hybrid‐based methods to analyze the strengths and limitations of the models of each category. In addition, it also discusses the usefulness of the methods according to different situations and applications. Furthermore, based on the review of different mining approaches, this article identifies the limitations of the existing methods and suggests new applications and future directions to continue the research in multiple directions. We believe such a review article will be useful to the researchers to quickly become familiar with the state‐of‐the‐art information and progresses made toward mining textual information from natural scene and video images. This article is categorized under: Algorithmic Development > Text Mining

show abstract