A comparison of 1D and 2D LSTM architectures for the recognition of handwritten Arabic

Yousefi, Mohammad Reza; Soheili, Mohammad Reza; Breuel, Thomas M.; Stricker, Didier

doi:10.1117/12.2075930

Cited by 23 publications

(14 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Although recognition methods using MDLSTM could learn both horizontal and vertical representation of a document image and reduce the recognition error caused by distortion, the training process is quite time consuming. M. Yousefi in [5] confirmed that if text line sample is properly preprocessed, 1D LSTM can outperform MDLSTM in handwritten Arabic recognition. They achieved state-of-the-art performance on IFN/ENIT dataset [6], moreover the training speed is fast compared with MDLSTM.…”

Section: Introductionmentioning

confidence: 63%

RNN Based Uyghur Text Line Recognition and Its Training Strategy

Zhu

Peng

et al. 2016

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

View full text Add to dashboard Cite

Uyghur language is written in a modified Arabic script. Due to its cursive nature and the lack of enough labeled training samples, Uyghur document recognition is still a challenging problem. In this paper, we propose a new Recurrent Neural Network (RNN) based Uyghur text line recognition method combining Gated Recurrent Unit (GRU) and Restricted Boltzmann Machine (RBM) with pretraining mechanism. We also present a novel curriculum learning technique guided by sample distribution information. Experimental results on practical Uyghur printed document image dataset show that the proposed network architecture and training strategy not only achieve better recognition accuracy compared with traditional methods, but can accelerate the training speed as well.

show abstract

Section: Introductionmentioning

confidence: 63%

RNN Based Uyghur Text Line Recognition and Its Training Strategy

Zhu

Peng

et al. 2016

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

View full text Add to dashboard Cite

show abstract

“…They used the OpenHaRT dataset, and implemented the n-gram language model, which was pre-smoothed using the Modified Kneser-Ney method. Yousefi et al (2015) performed a similar experiment as Chherawala et al (2013). However, in this experiment, they showed that LSTM, which was faster to learn and converge compared to MDLSTM, had also achieved better results in the same IFN/ENIT dataset, with the same handcraft features, namely, CCV, RM, MB, LGH.…”

Section: Arabic Text Recognition With Deep Learningmentioning

confidence: 88%

A Review of Arabic Text Recognition Dataset

Al-Sheikh¹,

Mohd²,

Warlina³

2020

APJITM

View full text Add to dashboard Cite

Building a robust Optical Character Recognition (OCR) system for languages, such as Arabic with cursive scripts, has always been challenging. These challenges increase if the text contains diacritics of different sizes for characters and words. Apart from the complexity of the used font, these challenges must be addressed in recognizing the text of the Holy Quran. To solve these challenges, the OCR system would have to undergo different phases. Each problem would have to be addressed using different approaches, thus, researchers are studying these challenges and proposing various solutions. This has motivate this study to review Arabic OCR dataset because the dataset plays a major role in determining the nature of the OCR systems. State-of-the-art approaches in segmentation and recognition are discovered with the implementation of Recurrent Neural Networks (Long Short-Term Memory-LSTM and Gated Recurrent Unit-GRU) with the use of the Connectionist Temporal Classification (CTC). This also includes deep learning model and implementation of GRU in the Arabic domain. This paper has contribute in profiling the Arabic text recognition dataset thus determining the nature of OCR system developed and has identified research direction in building Arabic text recognition dataset.

show abstract

“…For example, Breuel et al [14] combined a standard 1-D LSTM network architecture with a text line normalization method for performing OCR of printed Latin and Fraktur scripts. In a similar manner, by normalizing the positions and baselines of letters, Yousefi et al [16] achieved superior performance and faster convergence with a 1-D LSTM network over a 2-D variant for Arabic handwriting recognition.…”

Section: Related Workmentioning

confidence: 99%

Efficient, Lexicon-Free OCR using Deep Learning

Namysł

Konya

2019

2019 International Conference on Document Analysis and Recognition (ICDAR)

View full text Add to dashboard Cite

Contrary to popular belief, Optical Character Recognition (OCR) remains a challenging problem when text occurs in unconstrained environments, like natural scenes, due to geometrical distortions, complex backgrounds, and diverse fonts. In this paper, we present a segmentation-free OCR system that combines deep learning methods, synthetic training data generation, and data augmentation techniques. We render synthetic training data using large text corpora and over 2 000 fonts. To simulate text occurring in complex natural scenes, we augment extracted samples with geometric distortions and with a proposed data augmentation technique -alpha-compositing with background textures. Our models employ a convolutional neural network encoder to extract features from text images. Inspired by the recent progress in neural machine translation and language modeling, we examine the capabilities of both recurrent and convolutional neural networks in modeling the interactions between input elements. The proposed OCR system surpasses the accuracy of leading commercial and open-source engines on distorted text samples.

show abstract

A comparison of 1D and 2D LSTM architectures for the recognition of handwritten Arabic

Cited by 23 publications

References 26 publications

RNN Based Uyghur Text Line Recognition and Its Training Strategy

RNN Based Uyghur Text Line Recognition and Its Training Strategy

A Review of Arabic Text Recognition Dataset

Efficient, Lexicon-Free OCR using Deep Learning

Contact Info

Product

Resources

About