Multilingual Artificial Text Extraction and Script Identification from Video Images

Jamıl, Akhtar; Batool, Azra; Malik, Zumra; Mirza, Ali; Siddiqi, Imran

doi:10.14569/ijacsa.2016.070469

Cited by 10 publications

(7 citation statements)

References 51 publications

(63 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Script recognition has been studied by researchers for text in video images as well as printed and handwritten documents [50,51]. Recognition of script in video text is naturally much more challenging as opposed to printed or handwritten documents due to low resolution of text and in some cases complex backgrounds [52,53]. From simple methods based on template matching [54] to sophisticated structural [55] and statistical [56] features, a number of techniques have been reported in the literature.…”

Section: Script Recognitionmentioning

confidence: 99%

Detection and recognition of cursive text from video frames

Mirza

Zeshan

Atif

et al. 2020

J Image Video Proc.

Self Cite

View full text Add to dashboard Cite

Textual content appearing in videos represents an interesting index for semantic retrieval of videos (from archives), generation of alerts (live streams), as well as high level applications like opinion mining and content summarization. The key components of such systems require detection and recognition of textual content which also make the subject of our study. This paper presents a comprehensive framework for detection and recognition of textual content in video frames. More specifically, we target cursive scripts taking Urdu text as a case study. Detection of textual regions in video frames is carried out by fine-tuning deep neural networks based object detectors for the specific case of text detection. Script of the detected textual content is identified using convoluational neural networks (CNNs), while for recognition, we propose a UrduNet, a combination of CNNs and long short-term memory (LSTM) networks. A benchmark dataset containing cursive text with more than 13,000 video frame is also developed. A comprehensive series of experiments is carried out reporting an F-measure of 88.3% for detection while a recognition rate of 87%.

show abstract

Section: Script Recognitionmentioning

confidence: 99%

Detection and recognition of cursive text from video frames

Mirza

Zeshan

Atif

et al. 2020

J Image Video Proc.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Much of the current research on Urdu recognition is performed on the cleaned and segmented artificially generated Urdu Nastaliq text such as Urdu Printed Text Images (UPTI) [24], custom extracted [15], generated text with clear background [25], video tickers [26] or handwritten Urdu text [27] as opposed to extracting from outdoor or real-world images with complex background. This work is a step in that direction that integrates synthetic Urdu-text in natural outdoor images.…”

Section: Introductionmentioning

confidence: 99%

“…While for recognition of Urdu characters from outdoor images there are few custom datasets [11], [15], [25] and for recognition of printed characters words there is a famous dataset UPTI [24], which recently has been updated and has been presented with name UPTI2.0 [38] because the performance on UPTI has reached near saturation [33], [35]. There also exist CLE-18000 [32], [39] which contains near 18K ligatures (compound characters).…”

Section: Introductionmentioning

confidence: 99%

“…It can be seen that the English Language has the most available datasets [40]- [43] with the text style of a horizontal, oriented, and curved text. Also, English datasets with the multilingual text [7], [8], [15], [44] cover other languages than Urdu text. The number of images in English datasets varies from 500 to more than 60K.…”

Section: Introductionmentioning

confidence: 99%

“…The number of images in English datasets varies from 500 to more than 60K. Although researchers have mentioned Urdu text datasets [6], [15], [16], [37] for detection and recognition, only one dataset [37] is publicly available for text detection in video frame images. Also, scholars [45], [46] have done bilingual recognition on MSRA-TD [43] and other datasets.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Urdu-Text Detection and Recognition in Natural Scene Images Using Deep Learning

Arafat

Iqbal

2020

IEEE Access

View full text Add to dashboard Cite

Urdu text is a cursive script and belongs to a non-Latin family of other cursive scripts like Arabic, Chinese, and Hindi. Urdu text poses a challenge for detection/localization from natural scene images, and consequently recognition of individual ligatures in scene images. In this paper, a methodology is proposed that covers detection, orientation prediction, and recognition of Urdu ligatures in outdoor images. As a first step, the custom FasterRCNN algorithm has been used in conjunction with well-known CNNs like Squeezenet, Googlenet, Resnet18, and Resnet50 for detection and localization purposes for images of size 320 × 240 pixels. For ligature Orientation prediction, a custom Regression Residual Neural Network (RRNN) is trained/tested on datasets containing randomly oriented ligatures. Recognition of ligatures was done using Two Stream Deep Neural Network (TSDNN). In our experiments, five-set of datasets, containing 4.2K and 51K Urdu-text-embedded synthetic images were generated using the CLE annotation text to evaluate different tasks of detection, orientation prediction, and recognition of ligatures. These synthetic images contain 132, and 1600 unique ligatures corresponding to 4.2K and 51K images respectively, with 32 variations of each ligature (4-backgrounds and font 8-color variations). Also, 1094 real-world images containing more than 12k Urdu characters were used for TSDNN's evaluation. Finally, all four detectors were evaluated and used to compare them for their ability to detect/localize Urdu-text using average-precision (AP). Resnet50 features based FasterRCNN was found to be the winner detector with AP of.98. While Squeeznet, Googlenet, Resnet18 based detectors had testing AP of.65, .88, and .87 respectively. RRNN achieved and accuracy of 79% and 99% for 4k and 51K images respectively. Similarly, for characters classification in ligatures, TSDNN attained a partial sequence recognition rate of 94.90% and 95.20% for 4k and 51K images respectively. Similarly, a partial sequence recognition rate of 76.60% attained for real world-images. INDEX TERMS BLSTM, deep neural network, FasterRCNN, image classification, Nastalique, optical character recognition (OCR), regression residual neural network (RRNN), synthetic urdu text, text detection, two stream deep neural network (TSDNN).

show abstract