Proceedings of the ACL-IJCNLP 2009 Conference Short Papers on - ACL-IJCNLP '09 2009
DOI: 10.3115/1667583.1667595
|View full text |Cite
|
Sign up to set email alerts
|

Part of speech tagger for Assamese text

Abstract: Assamese is a morphologically rich, agglutinative and relatively free word order Indic language. Although spoken by nearly 30 million people, very little computational linguistic work has been done for this language. In this paper, we present our work on part of speech (POS) tagging for Assamese using the well-known Hidden Markov Model. Since no well-defined suitable tagset was available, we develop a tagset of 172 tags in consultation with experts in linguistics. For successful tagging, we examine relevant li… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2010
2010
2020
2020

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 36 publications
(7 citation statements)
references
References 6 publications
0
7
0
Order By: Relevance
“…So a morpho-syntactic approach gives batter results in comparison to [6], [10]. Another important observation from this experiment is that though Assamese is relatively free word order, some parts of speech do not occur in the initial or final positions of the sentence.…”
Section: Discussionmentioning
confidence: 77%
“…So a morpho-syntactic approach gives batter results in comparison to [6], [10]. Another important observation from this experiment is that though Assamese is relatively free word order, some parts of speech do not occur in the initial or final positions of the sentence.…”
Section: Discussionmentioning
confidence: 77%
“…In literature, numerous researches for resource constrained and agglutinative languages have proposed a two stage method for dealing with POS tagging challenges. The first stage involves the use of any or more of the above models for performing full morphosyntactic tagging, while the second one is for identifying morphologically-inflected word tag pairs with the help of morphological analysis [8,24,26,28]. For, example, in developing POS tagger for Assamese Text, an agglutinative Indic language, [26] use HMM and simple morphological analysis to determine probable tags for previously unseen words.…”
Section: Part-of-speech Tagger and Tagging Techniquesmentioning
confidence: 99%
“…Following the tokenization process, Part-Of-Speech tagging or the POS tagging technique will be used and the role of each word in each sentence or in other words, all verbs, nouns, adjectives, adverbs and other relevant elements in each sentence will be recognized. There are several approaches for building a POS tagger, but supervised and unsupervised tagging are the most common approaches [20]. Both of these tagging approaches have three sub-types.…”
Section: F Lexical and Syntactic Anlysismentioning
confidence: 99%
“…These are (1) rule based, (2) stochastic based and (3) neural network based. Hidden Markov model (HMM) is the most common stochastic tagging technique [20]. This technique was used to build the POS tagger.…”
Section: F Lexical and Syntactic Anlysismentioning
confidence: 99%