Tesseract OCR Engine is one of the most efficient open source OCR engines currently available. Recently, Tesseract OCR 3.01 is capable of recognizing Hindi language but still it needs some enhancement to improve the performance. The Hindi language recognition accuracy is quite low even for the printed text, as the conjunct character combinations of Hindi Language are not easily separable due to partial overlapping. The proposed approach solves this problem, so that Devanagari conjunct characters can easily be segmented and recognized using Tesseract OCR Engine. This paper presents a complete methodology to improve The Hindi Language Recognition accuracy. This paper also presents comparison with other Devanagari OCR engines available on the basis of recognition accuracy, processing time, font variations and database size.
General TermsPattern Recognition
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.