We present a CNN-BiLSTM system for the problem of offline English handwriting recognition, with extensive evaluations on the public IAM dataset, including the effects of model size, data augmentation and the lexicon. Our best model achieves 3.59% CER and 9.44% WER using CNN-BiLSTM network with CTC layer.Test time augmentation with rotation and shear transformations applied to the input image, is proposed to increase recognition of difficult cases and found to reduce the word error rate by 2.5% points. We also conduct an error analysis of our proposed method on IAM dataset, show hard cases of handwriting images and explore samples with erroneous labels. We provide our source code as public-domain, to foster further research to encourage scientific reproducibility.
With the ever increasing speed of the digitization process, a large collection of Ottoman documents is accessible to researchers and the general public. But, the majority of the users interested in these documents can not read these documents unless they are transcripted to the modern Turkish script which use an extended version of the Latin alphabet. Manual transcription of such a massive amount of documents is beyond the capacity of human experts. As a solution, we propose an automatic recognition system for printed Ottoman documents which transcribes Ottoman texts directly to the modern Turkish script. We evaluated three decoding strategies including the Word Beam Search decoder that allows to use a recognition lexicon and n-gram statistics during the decoding phase. The system achieves 2.25% character error rate and 6.42% word error rate on a test set of 1.4K samples, using the test set transcriptions as the recognition lexicon. Using a general purpose, large lexicon of the Ottoman era (260K words and 77% test coverage), the performance is measured as 3.68% character error rate and 16.61% word error rate.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.