Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models

Wu, Yichao; Yin, Fei; Liu, Cheng‐Lin

doi:10.1016/j.patcog.2016.12.026

Cited by 159 publications

(66 citation statements)

References 60 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In Table 3, we provide the comparison between ACE loss and previous methods. It is evident that the proposed ACE loss function exhibits higher performance than previous methods, including MDLSTM-based models [34,47], HMM-based model [10], and over-segmentation methods [27,44,45,48] with and without language model (LM). Compared to scene text recognition, handwritten Chinese text recognition problem possesses its unique challenges, such as large character set (7357 classes) and charactertouching problem.…”

Section: Resultsmentioning

confidence: 95%

“…For 1D prediction problems, the topmost feature maps of the network are collapsed across the vertical dimension to generate 1D prediction [5] because characters in the original images are generally distributed sequentially. Typical examples are regular scene text recognition [38,54], online/offline handwritten text recognition [12,34,48], and speech recognition [14,2]. For 2D prediction problems, characters in the input image are dis- tributed in a specific spatial structure.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Aggregation Cross-Entropy for Sequence Recognition

Xie

Huang

Zhu

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

109

View full text Add to dashboard Cite

In this paper, we propose a novel method, aggregation cross-entropy (ACE), for sequence recognition from a brand new perspective. The ACE loss function exhibits competitive performance to CTC and the attention mechanism, with much quicker implementation (as it involves only four fundamental formulas), faster inference\back-propagation (approximately O(1) in parallel), less storage requirement (no parameter and negligible runtime memory), and convenient employment (by replacing CTC with ACE). Furthermore, the proposed ACE loss function exhibits two noteworthy properties: (1) it can be directly applied for 2D prediction by flattening the 2D prediction into 1D prediction as the input and (2) it requires only characters and their numbers in the sequence annotation for supervision, which allows it to advance beyond sequence recognition, e.g., counting problem. The code is publicly available at https://github.

show abstract

Section: Resultsmentioning

confidence: 95%

Section: Introductionmentioning

confidence: 99%

Aggregation Cross-Entropy for Sequence Recognition

Xie

Huang

Zhu

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

109

View full text Add to dashboard Cite

show abstract

“…Although the confusion among the 7360 classes is higher, Table IX shows an overall comparison of our proposed method and other state-of-the-art methods without/with a language model on the ICDAR 2013 competition set. we list the state-of-theart oversegmentation method heterogeneous CNN [7], CNNs-RNNLM [8] and the segmentationfree method SMDLSTM-CTC [15], CNN-ACE [16] in Table IX for comparison. With the same configuration of vocabulary size (4 more garbage classes adopted in our HMM system), the proposed WCNN-PHMM yielded the best performance whether a language model was employed or not.…”

Section: ) Visualization Analysis For Writer Codementioning

confidence: 99%

“…In general, the research efforts for offline HCTR can be divided into two categories: oversegmentationbased approaches and segmentation-free approaches. The former approaches [5], [6], [7], [8] often build several modules by first including character oversegmentation, character classification, and modeling the linguistic and geometric contexts, and then incorporating them to calculate the score for path search. The recent work in [8], with the neural network language model, adopted three different CNN models to replace the conventional character classifier, segmentation and geometric models to achieve the best performance of oversegmentation-based methods on the ICDAR 2013 competition dataset [9].…”

Section: Introductionmentioning

confidence: 99%

Writer-aware CNN for parsimonious HMM-based offline handwritten Chinese text recognition

Wang

2020

Pattern Recognition

View full text Add to dashboard Cite

Recently, the hybrid convolutional neural network hidden Markov model (CNN-HMM) has been introduced for offline handwritten Chinese text recognition (HCTR) and has achieved state-of-the-art performance. However, modeling each of the large vocabulary of Chinese characters with a uniform and fixed number of hidden states requires high memory and computational costs and makes the tens of thousands of HMM state classes confusing. Another key issue of CNN-HMM for HCTR is the diversified writing style, which leads to model strain and a significant performance decline for specific writers. To address these issues, we propose a writer-aware CNN based on parsimonious HMM (WCNN-PHMM). First, PHMM is designed using a data-driven state-tying algorithm to greatly reduce the total number of HMM states, which not only yields a compact CNN by state sharing of the same or similar radicals among different Chinese characters but also improves the recognition accuracy due to the more accurate modeling of tied states and the lower confusion among them. Second, WCNN integrates each convolutional layer with one adaptive layer fed by a writer-dependent vector, namely, the writer code, to extract the irrelevant variability in writer information to improve recognition performance.The parameters of writer-adaptive layers are jointly optimized with other network parameters in the training stage, while a multiple-pass decoding strategy is adopted to learn the writer code and generate recognition results. Validated on the ICDAR 2013 competition of CASIA-HWDB database, the more compact WCNN-PHMM of a 7360-class vocabulary can achieve a relative character error rate (CER) reduction of 16.6% over the conventional CNN-HMM without considering language modeling. By adopting a powerful hybrid language model (N-gram language model and recurrent neural network language model), the CER of WCNN-PHMM is reduced to 3.17%. Moreover, the state-tying results of PHMM explicitly show the information sharing among similar characters and the confusion reduction of tied state classes. Finally, we visualize the learned writer codes and demonstrate the strong relationship with the writing styles of different writers. To the best of our knowledge, WCNN-PHMM yields the 2 best results on the ICDAR 2013 competition set, demonstrating its power when enlarging the size of the character vocabulary. Index TermsOffline handwritten Chinese text recognition, writer-aware CNN, parsimonious HMM, state tying, adaptation, hybrid language model.

show abstract

“…Unlike the existing methods that usually employ generic (category-level) human detectors, our approach targets on assigning each moving person a specific tracker to reduce ambiguities in complex scenes. Additionally, modern advances in the development of deep feature representation learning [1,2,3] for object appearance have created new opportunities for MPT methods, which par- tially motivate us to learn instance-level object representations by deep neural nets. Therefore, we develop a multibranch neural network (MBN) that dynamically learns instance-level representations of tracked persons at a low cost, which facilitates robustly online data association for multiple target tracking and thus gives birth to our INstance-Aware Representation Learning and Association (INARLA) framework.…”

Section: Introductionmentioning

confidence: 99%

Instance-aware representation learning and association for online multi-person tracking

Wang

et al. 2019

Pattern Recognition

View full text Add to dashboard Cite

Multi-Person Tracking (MPT) is often addressed within the detection-to-association paradigm. In such approaches, human detections are first extracted in every frame and person trajectories are then recovered by a procedure of data association (usually offline). However, their performances usually degenerate in presence of detection errors, mutual interactions and occlusions. In this paper, we present a deep learning based MPT approach that learns instance-aware representations of tracked persons and robustly online infers states of the tracked persons. Specifically, we design a multibranch neural network (MBN), which predicts the classification confidences and locations of all targets by taking a batch of candidate regions as input. In our MBN architecture, each branch (instance-subnet) corresponds to an individual to be tracked and new branches can be dynamically created for handling newly appearing persons. Then based on the output of MBN, we construct a joint association matrix that represents meaningful states of tracked persons (e.g., being tracked or disappearing from the scene) and solve it by using the efficient Hungarian algorithm. Moreover, we allow the instance-subnets to be updated during tracking by online mining hard examples, accounting to person appearance variations over time. We comprehensively evaluate our framework on a popular MPT benchmark, demonstrating its excellent performance in comparison with recent online MPT methods.

show abstract

Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models

Cited by 159 publications

References 60 publications

Aggregation Cross-Entropy for Sequence Recognition

Aggregation Cross-Entropy for Sequence Recognition

Writer-aware CNN for parsimonious HMM-based offline handwritten Chinese text recognition

Instance-aware representation learning and association for online multi-person tracking

Contact Info

Product

Resources

About