2012
DOI: 10.5120/4625-6867
|View full text |Cite
|
Sign up to set email alerts
|

Improving Unsupervised Stemming by using Partial Lemmatization Coupled with Data-based Heuristics for Hindi

Abstract: Stemming and Lemmatization are two important natural language processing techniques widely used in Information Retrieval (IR) for query processing and in Machine Translation (MT) for reducing the data sparseness. Both minimize inflectional forms, and sometimes derivationally related forms of a word, to a common base form. Most of the existing stemmer and lemmatization work is based either on some language dependent rules which require the supervision of a language expert, or some probabilistic approach that ne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 7 publications
0
4
0
Order By: Relevance
“…From Table 1, it can be noted that all the language processing models outperformed the baseline algorithm. This is expected as the baseline algorithm returns results based on the search query only, without taking any further processing into consideration such as stemming or lemmatization [20,29].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…From Table 1, it can be noted that all the language processing models outperformed the baseline algorithm. This is expected as the baseline algorithm returns results based on the search query only, without taking any further processing into consideration such as stemming or lemmatization [20,29].…”
Section: Resultsmentioning
confidence: 99%
“…For instance, some studies found stemming used with clustering algorithms to be beneficial in English texts [26], and also other languages [27,28]. Gupta et al [29] combined stemming with partial lemmatization for Hindi language with results indicating significant improvements than other traditional approaches. Another study compared stemming and lemmatization in clustering Finnish text documents, with results indicating the use of lemmatization to be better than stemming [30].…”
Section: Lemmatizationmentioning
confidence: 99%
“…The handcrafted rule based stemming approach is easy if developer has proper linguistic knowledge on the hand the lemmatization rule can be easily produced without linguistic knowledge provided the given training data is correct [45]. Further, a comparison of lemmatization and stemming was performed in the information retrieval of documents using clustering and the result depicts that lemmatization gives best performance as compared to stemming [46].…”
Section: Different Preprocessing Techniques Provide Different Classif...mentioning
confidence: 99%
“…Additionally, they also found that the performance of information retrieval was better when the maximum length of lemmas is used. In 2012, Gupta et al [12] combined stemming and partial lemmatization and tested their model on the Hindi language. Their model yielded significant improvements compared to the traditional approaches.…”
Section: Introductionmentioning
confidence: 99%