The morphological variations of highly inflected languages that appear in a text impede the progress of computer processing and root word determination tasks while extracting an abstract. As a remedy to this difficulty, a lemmatization algorithm is developed, and its effectiveness is evaluated for Word Sense Disambiguation (WSD). Having observed its usefulness, lemmatizer is considered for developing Natural Language Processing tools for languages rich in morphological variations. Among various Indian highly inflected languages, Assamese, spoken by over 14 million people in the North-Eastern region of India, is also one of them. In this present work, after a detailed study on the possible transformations through which surface words are created from lemmas, we have designed an Assamese lemmatizer in such a manner that suitable reverse transformations can be employed on a surface word to derive the co-relative (similar) lemma back. And it has been observed that the lemmatizer is competent to deal with inflectional and derivational morphology in Assamese, and the same was evaluated on various Assamese articles extracted from the Assamese Corpus consisting of 50,000 surface words (excluding proper nouns), and the result that it yielded with 82% accuracy was quite encouraging and satisfying, as Assamese is a low-level language and no research work has been done in the Assamese language regarding the lemmatization of words. Considering the result obtained, the lemmatizer is then evaluated for Assamese WSD. For this purpose, 10 highly polysemous Assamese words are taken into account for sense disambiguation. We have also regarded varied WSD systems and observed that such systems enhance the effectiveness of all the WSD systems, which is statistically significant.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.