This paper proposes a shared-hidden-layer deep convolutional neural network (SHL-CNN) for image character recognition. In SHL-CNN, the hidden layers are made common across characters from different languages, performing a universal feature extraction process that aims at learning common character traits existed in different languages such as strokes, while the final softmax layer is made language dependent, trained based on characters from the destination language only. This paper is the first attempt to introduce the SHL-CNN framework to image character recognition. Under the SHL-CNN framework, we discuss several issues including architecture of the network, training of the network, from which a suitable SHL-CNN model for image character recognition is empirically learned. The effectiveness of the learned SHL-CNN is verified on both English and Chinese image character recognition tasks, showing that the SHL-CNN can reduce recognition errors by 16-30% relatively compared with models trained by characters of only one language using conventional CNN, and by 35.7% relatively compared with state-of-the-art methods. In addition, the shared hidden layers learned are also useful for unseen image character recognition tasks.
Abstract. Inspired by the success of deep neural network (DNN) models in solving challenging visual problems, this paper studies the task of Chinese Image Character Recognition (ChnICR) by leveraging DNN model and huge machine simulated training samples. To generate the samples, clean machine born Chinese characters are extracted and are plus with common variations of image characters such as changes in size, font, boldness, shift and complex backgrounds, which in total produces over 28 million character images, covering the vast majority of occurrences of Chinese character in real life images. Based on these samples, a DNN training procedure is employed to learn the appropriate Chinese character recognizer, where the width and depth of DNN, and the volume of samples are empirically discussed. Parallel to this, a holistic Chinese image text recognition system is developed. Encouraging experimental results on text from 13 TV channels demonstrate the effectiveness of the learned recognizer, from which significant performance gains are observed compared to the baseline system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.