Computational Linguistics in the Netherlands 2001 2002
DOI: 10.1163/9789004334038_010
|View full text |Cite
|
Sign up to set email alerts
|

Accurate Stemming of Dutch for Text Classification

Abstract: This paper investigates the use of stemming for classification of Dutch (email) texts. We introduce a stemmer, which combines dictionary lookup (implemented efficiently as a finite state automaton) with a rule-based backup strategy and show that it outperforms the Dutch Porter stemmer in terms of accuracy, while not being substantially slower.For text classification, the most important property of a stemmer is the number of words it (correctly) reduces to the same stem. Here the dictionary-based system also ou… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2005
2005
2017
2017

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(5 citation statements)
references
References 9 publications
0
5
0
Order By: Relevance
“…If the word found to be illogical then it substitutes the suffix with the other words [4]. In the Dutch stemmer it uses a suffix stripping algorithm and dictionary lookup rule based methods [5].In the Nepali Stemming it uses a morphological analyzer which determines the given inflected word .In this it also tells about the Dawson stemming algorithm, krowertz algorithm [6].Lightweight Stemmer for Bengali also exists. In which it just strips the affix from the word without doing the complete morphological analysis.…”
Section: Related Workmentioning
confidence: 99%
“…If the word found to be illogical then it substitutes the suffix with the other words [4]. In the Dutch stemmer it uses a suffix stripping algorithm and dictionary lookup rule based methods [5].In the Nepali Stemming it uses a morphological analyzer which determines the given inflected word .In this it also tells about the Dawson stemming algorithm, krowertz algorithm [6].Lightweight Stemmer for Bengali also exists. In which it just strips the affix from the word without doing the complete morphological analysis.…”
Section: Related Workmentioning
confidence: 99%
“…The rule-based approach is a traditional method for stemming/lemmatisation (i.e. affix stripping) (Porter 1980;Gaustad and Bouma, 2002) and entails the use of language-specific rules to identify the base-forms (i.e. lemmas) of word forms.…”
Section: Lemmatisationmentioning
confidence: 99%
“…For example, Gaustad and Bouma (2002) report results from experiments on Dutch email and news text classification using simple suffix stripping and a dictionary-based stemming. Neither method improved classification accuracy in their experiments.…”
Section: Stemmingmentioning
confidence: 99%