Training deep learning based handwritten text recognition systems needs a lot of data in terms of text images and their corresponding annotations. One way to deal with this issue is to use data augmentation techniques to increase the amount of training data. Generative Adversarial Networks (GANs) based data augmentation techniques are popular in literature especially in tasks related to images. However, specific challenges need to be addressed in order to effectively use GANs for data augmentation in the domain of text recognition. Text data is inherently imbalanced in terms of frequency of different characters appearing in training samples and the training data as a whole. GANs trained on the imbalanced dataset leads to augmented data that does not represent the minority characters well. In this paper, we present an adaptive data augmentation technique using GANs that deals with the issue of class imbalance arising in text recognition problems. We show, using experimental evaluations on two publicly available datasets for handwritten Arabic text recognition, that the GANs trained using the presented technique is effective in dealing with class imbalanced problem by generating augmented data that is balanced in terms of character frequencies. The resulting text recognition systems trained on the balanced augmented data improves the text recognition accuracy as compared to the systems trained using standard techniques.
Recognition of cursive handwritten Arabic text is a difficult problem because of contextsensitive character shapes, the non-uniform spacing between words and within a word, diverse placements of dots, and diacritics, and very low inter-class variation among individual classes. In this paper, we review and investigate different deep learning architectures and modeling choices for Arabic handwriting recognition. Further, we address the problem that imbalanced data sets present to deep learning systems. In order to address this issue, we are presenting a novel adaptive data-augmentation algorithm to promote class diversity. This algorithm assigns a weight to each word in the database lexicon. This weight is calculated based on the average probability of each class in a word. Experimental results on the IFN/ENIT and AHDB databases have shown that our presented approach yields state-of-the-art results. INDEX TERMS Arabic handwriting recognition (AHR), deep learning neural network (DLNN), convolutional neural networks (CNN), connectionist temporal classification (CTC), recurrent neural network (RNN), IFN/ENIT database, long short-term memory (LSTM), bi-directional long short-term memory (BLSTM), word beam search (WBS).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.