2021
DOI: 10.1007/978-981-15-8443-5_28
|View full text |Cite
|
Sign up to set email alerts
|

Implementation of Stemmer and Lemmatizer for a Low-Resource Language—Kannada

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 4 publications
0
3
0
Order By: Relevance
“…Ferilli [17] presented a term-document frequency approach that automatically detects stopwords from a small amount of the corpora and stated that it was the most effective approach which outperformed the classic term frequency [18,2,19] and the normalized inverse document frequency of Lo et al [20]. Trishala and Mamatha [21] proposed a rule-based Kannada stemmer relying on an unsupervised approach using k-means algorithm, and Thangarasu and Inbarani [22] presented an analogy removal stemmer that automatically stem Tamil words from the text corpora. The state-of-art of query spelling corrections explained by Chang et al [23] are approaches applicable for low-and high-resource languages.…”
Section: Related Workmentioning
confidence: 99%
“…Ferilli [17] presented a term-document frequency approach that automatically detects stopwords from a small amount of the corpora and stated that it was the most effective approach which outperformed the classic term frequency [18,2,19] and the normalized inverse document frequency of Lo et al [20]. Trishala and Mamatha [21] proposed a rule-based Kannada stemmer relying on an unsupervised approach using k-means algorithm, and Thangarasu and Inbarani [22] presented an analogy removal stemmer that automatically stem Tamil words from the text corpora. The state-of-art of query spelling corrections explained by Chang et al [23] are approaches applicable for low-and high-resource languages.…”
Section: Related Workmentioning
confidence: 99%
“…Document classification is the task of assigning the category 𝑐 𝑖 for the given document 𝑑 𝑗 , where   The figure 1 depicts the generic block diagram of Kannada document classification process. The raw Kannada text documents comprise of some unwanted texts and removal of these is essential for better computation [25]. Stopwords removal, lemmatization, stemming, tokenization, and transliteration are few pre-processing techniques which aide in reducing the dimensionality and complexity of text processing.…”
Section: Generic Architecturementioning
confidence: 99%
“…At the preprocessing stage, lemmatization and stemming are two basic modules used for the normalization of text. In [25] authors presented Unsu-pervised Stemmer and Rule-Based Lemmatizer for Kannada documents. Experimentation is carried out by building a dataset of 17,825 root words with the help of Kannada dictionary.…”
Section: ( ) Log (1)mentioning
confidence: 99%