2004
DOI: 10.1075/ijcl.9.1.04kno
|View full text |Cite
|
Sign up to set email alerts
|

The notion of a “lemma”

Abstract: The notion of alemmais so familiar in corpus linguistics that it scarcely needs a formal definition. When a wordlist or a text is lemmatised, the process is apparently transparent, so that any observer can understand how the lemma relates to the original set or string of words. We shall argue in this paper that, on the contrary, the concept of lemma is not well defined, and is in need of a clear formal definition. The lemma is a fundamental concept in the processing of texts in at least some languages, a point… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2005
2005
2023
2023

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(10 citation statements)
references
References 2 publications
0
10
0
Order By: Relevance
“…It is therefore also common to annotate words with other sorts of labels, instead of attempting an explicit respelling. Lemmas are often used in this respect Van der Voort van der Kleij, 2005: a lemma is a normalized label, which unambiguously links words to the same entry (headword) in a lexical resource, such as a dictionary, if they only differ in inflection or spelling (Knowles and Mohd Don, 2004).…”
Section: Related Researchmentioning
confidence: 99%
See 1 more Smart Citation
“…It is therefore also common to annotate words with other sorts of labels, instead of attempting an explicit respelling. Lemmas are often used in this respect Van der Voort van der Kleij, 2005: a lemma is a normalized label, which unambiguously links words to the same entry (headword) in a lexical resource, such as a dictionary, if they only differ in inflection or spelling (Knowles and Mohd Don, 2004).…”
Section: Related Researchmentioning
confidence: 99%
“…7 Each token in the corpora has been annotated with a normalized dictionary headform or lemma in a present-day spelling. Broadly speaking, lemmatization allows us to abstract over individual instances of tokens which only differ in inflection or orthography (Knowles and Mohd Don, 2004). For historical corpora, this has interesting applications in the context of database searching, (diachronic) topic modelling or stylometry.…”
Section: Data Setsmentioning
confidence: 99%
“…Lemmatization is the task of mapping a token to its corresponding dictionary head-form to allow downstream applications to abstract away from orthographic and inflectional variation (Knowles and Mohd Don, 2004). While lemmatization is considered to be solved for analytic and resourcerich languages such as English, it remains an open challenge for morphologically complex (e.g.…”
Section: Introductionmentioning
confidence: 99%
“…Knowles and Mohd Don 2004). According to lexicographical recommendations, all forms which naturally come to mind to users when searching a dictionary should function as headwords.…”
Section: Citation-forms In Swahili Dictionariesmentioning
confidence: 99%