This paper presents a new segmentation and recognition algorithms for Myanmar script inputted from offline printed images. Zone segmentation considers horizontal and vertical zones; it is applied to segment letters according to their roles such as primary or peripheral characters. In doing so, statistical and structural features of segmented characters are explored and exploited in recognition process. Hidden Markov model is used for recognition of primary characters while Kohonen self-organization map is used for peripheral characters. The recognized characters by each model are then combined, and finally are recognized by k-nearest neighbors algorithm with the help of lexicon is composed of all common Myanmar characters. Our OCR system for Myanmar characters tested on a dataset that approximately contains 7560 compounded characters. From the results, our system achieves higher significant results both segmentation and recognition compared to the other contemporary Myanmar OCR’s approaches.
Optical Character Recognition (OCR) is a technology widely adopted for automatic translation of hardcopy text to editable text. The language dependence of the technology makes it far less developed for less popular languages like Myanmar language. Also, the uniqueness and complexity of the Myanmar text system such as touching and complex characters have continued to pose serious challenges to several OCR investigators. In this paper, we propose a new technique to development Myanmar OCR system. Our technique implement skew angle detection and free skew, noisy border correction, extra page elimination, line segmentation from scanned images of Myanmar text. Performance of the proposed method is tested with 430 documents comprising different printed and handwritten Myanmar text of various fonts, sizes, multi-column, tables, stamps or photos, background effects. Our method give an accuracy of 100% for line segmentation and 99.92% for skew angle detection and free skew. The ability of our method to effectively implement global and local skew angle detection, free skew and line segmentation in different handwritten and digital text images of the Myanmar character set with high accuracies confirms the robustness of the technique, its reliability and its suitability for application in many other related languages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.