Unconstrained scene text and video text recognition for Arabic script

Jain, Mohit; Mathew, Minesh; Jawahar, C. V.

doi:10.1109/asar.2017.8067754

Cited by 41 publications

(30 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The objective of this task is to predict precise word level bounding boxes and the corresponding script class for each word. Existing text recognition algorithms [18,19,39] are language-dependent which makes script identification a prerequisite task for the other methods. In Section 4.3, we experimentally demonstrated that script identification is not required for multi-language text recognition.…”

Section: Joint Multi-language Text Localization and Script Identificamentioning

confidence: 99%

E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text

Bušta

Patel

Matas

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

An end-to-end trainable (fully differentiable) method for multilanguage scene text localization and recognition is proposed. The approach is based on a single fully convolutional network (FCN) with shared layers for both tasks. E2E-MLT is the first published multi-language OCR for scene text. While trained in multi-language setup, E2E-MLT demonstrates competitive performance when compared to other methods trained for English scene text alone. The experiments show that obtaining accurate multi-language multi-script annotations is a challenging problem.

show abstract

Section: Joint Multi-language Text Localization and Script Identificamentioning

confidence: 99%

E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text

Bušta

Patel

Matas

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…The technique is evaluated on two datasets ACTiV [115] and the ALIF [116,117] and reports high recognition rates. A similar work is reported in [118] where a combination of CNN and LSTM is employed to recognize Arabic text in video frames. Another deep learning-based solution is presented in [119] where Lu et al compare the performance of different pre-trained ConvNets for detection and recognition of caption text.…”

Section: Text Recognitionmentioning

confidence: 92%

Detection and recognition of cursive text from video frames

Mirza

Zeshan

Atif

et al. 2020

J Image Video Proc.

View full text Add to dashboard Cite

Textual content appearing in videos represents an interesting index for semantic retrieval of videos (from archives), generation of alerts (live streams), as well as high level applications like opinion mining and content summarization. The key components of such systems require detection and recognition of textual content which also make the subject of our study. This paper presents a comprehensive framework for detection and recognition of textual content in video frames. More specifically, we target cursive scripts taking Urdu text as a case study. Detection of textual regions in video frames is carried out by fine-tuning deep neural networks based object detectors for the specific case of text detection. Script of the detected textual content is identified using convoluational neural networks (CNNs), while for recognition, we propose a UrduNet, a combination of CNNs and long short-term memory (LSTM) networks. A benchmark dataset containing cursive text with more than 13,000 video frame is also developed. A comprehensive series of experiments is carried out reporting an F-measure of 88.3% for detection while a recognition rate of 87%.

show abstract

“…In recent years, several novel works for cursive text detection and recognition in video images have been developed [51]- [54], while a limited work is presented for cursive text recognition in natural scenes [55]- [57]. Ahmed et al [55], modified the maximally stable extremal region method to extract the scale-invariant features and passed to the multi-dimensional long short term memory (MDLSTM) classifier.…”

Section: B Cursive Text Recognition In Video and Natural Scene Imagesmentioning

confidence: 99%

“…Jain et al [51] used a hybrid CNN-RNN network to recognize Arabic text in videos and natural images. To train the network, they created a large-scale synthetic dataset.…”

Section: Ieee Accessmentioning

confidence: 99%

Cursive Character Recognition in Natural Scene Images Using a Multilevel Convolutional Neural Network Fusion

2020

View full text Add to dashboard Cite

The accuracy of current natural scene text recognition algorithms is limited by the poor performance of character recognition methods for these images. The complex backgrounds, variations in the writing, text size, orientations, low resolution and multi-language text make recognition of text in natural images a complex and challenging task. Conventional machine learning and deep learning-based methods have been developed that have achieved satisfactory results, but character recognition for cursive text such as Arabic and Urdu scripts in natural images is still an open research problem. The characters in the cursive text are connected and are difficult to segment for recognition. Variations in the shape of a character due to its different positions within a word make the recognition task more challenging than non-cursive text. Optical character recognition (OCR) techniques proposed for Arabic and Urdu scanned documents perform very poorly when applied to character recognition in natural images. In this paper, we propose a multiscale feature aggregation (MSFA) and a multi-level feature fusion (MLFF) network architecture to recognize isolated Urdu characters in natural images. The network first aggregates multi-scale features of the convolutional layers by up-sampling and addition operations and then combines them with the high-level features. Finally, the outputs of the MSFA and MLFF networks are fused together to create more robust and powerful features. A comprehensive dataset of segmented Urdu characters is developed for the evaluation of the proposed network models. Synthetic text on the patches of images with real natural scene backgrounds is generated to increase the samples of infrequently used characters. The proposed model is evaluated on the Chars74K and ICDAR03 datasets. To validate the proposed model on the new Urdu character image dataset, we compare its performance with the histogram of oriented gradients (HoG) method. The experimental results show that the aggregation of multi-scale and multilevel features and their fusion is more effective, and outperforms other methods on the Urdu character image and Chars74K datasets. INDEX TERMS Cursive text recognition, natural scene Urdu character recognition, multi-scale feature aggregation, multi-level feature fusion, convolutional neural network (CNN)

show abstract

Unconstrained scene text and video text recognition for Arabic script

Cited by 41 publications

References 36 publications

E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text

E2E-MLT - An Unconstrained End-to-End Method for Multi-language Scene Text

Detection and recognition of cursive text from video frames

Cursive Character Recognition in Natural Scene Images Using a Multilevel Convolutional Neural Network Fusion

Contact Info

Product

Resources

About