We present two algorithms that extend existing HMM parameter adaptation algorithms (MAP and MLLR) by adapting the HMM structure. This improvement relies on a smart combination of MAP and MLLR with a structure optimization procedure. Our algorithms are semi-supervised: to adapt a given HMM model on new data, they require little labeled data for parameter adaptation and a moderate amount of unlabeled data to estimate the criteria used for HMM structure optimization. Structure optimization is based on state splitting and state merging operations and proceeds so as to optimize either the likelihood or a heuristic criterion. Our algorithms are successfully applied to the recognition of printed characters by adapting the HMM character models of a polyfont printed text recognizer to new fonts. Our experiments involve a total of 1,120,000 real and 3,100,000 synthetic character images and concern a set of 89 HMM models. A comparison of our results with those of state-of-the-art adaptation algorithms (MAP and MLLR) shows a significant increase in the accuracy of character recognition.
We create a polyfont OCR recognizer using HMM (Hidden Markov models) models of character trained on a dataset of various fonts. We compare this system to monofont recognizers showing its decrease of performance when it is used to recognize unseen fonts. In order to fill this gap of performance, we adapt the parameters of the models of the polyfont recognizer to a new dataset of unseen fonts using four different adaptation algorithms. The results of our experiments show that the adapted system is far more accurate than the initial system although it does not reach the accuracy of a monofont recognizer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.