Abstract-Part of speech (POS) tagging is basically the process of automatically assigning its lexical category to each word according to its context and definition. Each word of sentence is marked in croups as corresponding to a particular part of speech like noun, verb, adjective and adverb. POS serves as a first step in natural language process applications like information extraction, parsing, and word sense disambiguation etc. this paper presents a survey on Part of Speech taggers used for Indian languages. The main problem of tagging is to find proper way to tag each word according to particular part of speech. Very less work has been done for POS tagging on Indian languages mainly due to morphologically richness. In this paper, various techniques are discussed that are used for development of POS tagger.
Abstract-Part-of-Speech tagging is the way to tag every word in a text as a particular part of speech, e.g. proper verb, adverb etc. POS tagging is the first important step in the processing of NLP applications. This paper reports the survey on POS tagging for various Languages. Various techniques used for POS tagging also described in this paper. Due to complex structural effect, the number of problems occurs when tagging the sentences written in various languages. A lot of work has been done by the researchers in this field for various languages using various techniques HMM (Hidden Marcov Model), SVM (Support Vector Machine), ME (Maximum Entropy) etc.Keywords-Natural Language Processing, Part of speech Processing, Tagset, Indian Languages I. INTRODUCTIONThe NLP (natural language processing) is the process that provides the facility of interaction between human and machine. It is a component of computer science, linguistics and artificial intelligence. It is difficult task to build NLP application because human speech is not always specific. The main objective of NLP is to develop such a system that can understand text and translate between human language and another. The work in area of Part-of-Speech (POS) tagging has begun in the early 1960s. Part of Speech tagging is an important tool for NLP. It is one of the simplest as well as statistical models for many NLP applications. POS Tagging is an initial step of information extraction, summarization, retrieval, machine translation, speech conversion [2].POS tagging is the process of assigning the best grammar tag to each word of text like verb, noun, pronoun , adjective , adverb, conjunction , preposition etc. some unknown words exist in every language so it is very difficult task to assign the appropriate POS tag to each word in a sentence [3]. The mostly work that has been done for Indian languages was one of the rule based approaches and other empirical based POS tagging Approach. But the fact was that rule-based approach requires proper language knowledge and hand written rule. Due to morphological effect of Indian languages, researchers faced a great problem to write proper linguistic rules and many cases it was noticed that results were not good. Most of natural language processing work has been done for Hindi, Tamil, Malayalam and Marathi and several part-of-speech taggers have been applied for these languages. After this, researchers moved to stochastic based approach. However the stochastic methods requires large corpora to be effective, but still many successful POS were developed and used in various natural language processing tasks for Indian language. The main issue after morphological richness of Indian Languages is Ambiguity. It is very time consuming process to assign a correct POS tag to different context words. Due to this reason, POS Tagging is becoming a challenging problem for study in the field of NLP [1].
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.