Abstract-A bilingual concept lexicon is of significance for Information Extraction (IE), Machine Translation (MT), Word Sense Disambiguation (WSD) and the like. Myanmar-English Bilingual WordNet like Lexicon (MEBWL) is developed to fulfill the requirements of Language Acquisition (LA). However, it is reasonably difficult to build such a lexicon is quite challenging in time and cost consuming. To overcome this challenging, this paper integrates linguistic resources, including Myanmar-English dictionary, English-Myanmar dictionary and WordNet to construct a Myanmar-English WordNet like lexicon by acquiring the lexical and conceptual knowledge from WordNet and Myanmar<->English Machine Readable Dictionaries (MRDs). The system includes three phases which include the MRD extraction phase, the link analyzing phase and the WordNet construction phase. The first phase converts the data from multiple resources with different format into a common format and joins and aligns the scattered data for smoothly access and group the data according their part of speech (POS). The link analyzing phase analyzes, classifies and generates candidates of translation links. In the constructing phase, MEBWL is constructed from the verified translation link and WordNet. Beside then, to support the inflected word of Myanmar to English words, morphological processor is designed.
This paper proposed a unified approach for Myanmar Word analysis using Finite State Automata (FSA), Rule Based Heuristic Approach and Statistical Approach. Myanmar has no inter-word space and it make the tokenizing task difficulties. Therefore, to recognize the word, we implement with FSA. Segmentation is a major problem because of no delimiter. If there were errors in segmentation, this will cause subsequence failure in further NLP processes. Segmentation is also an essential preprocessing task for Natural Language Processing, such as Machine Translation, Information Retrieval etc. In this system, the Rule Based Heuristic Approach and Statistical Approach are used with corpus based dictionary. Evaluation results showed that the method is very effective for the Myanmar language.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.