IFIP International Federation for Information Processing
DOI: 10.1007/978-0-387-44641-7_45
|View full text |Cite
|
Sign up to set email alerts
|

Educating Lia: The Development of a Linguistically Accurate Memory-Based Lemmatiser for Afrikaans

Abstract: This paper describes the development of a memory-based lemmatiser for Afrikaans called Lia. The paper commences with a brief overview of Afrikaans lemmatisation and it is indicated that lemmatisation is seen as a simplified process of morphological analysis within the context of this paper. This overview is followed by an introduction to memory-based learning -the machine learning technique that is used in the development of the Afrikaans lemmatiser. The deployment of Lia is then discussed with specific emphas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

0
4
0

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 6 publications
0
4
0
Order By: Relevance
“…The lemmatization models were trained using the training data from the NCHLT Text Project. For Afrikaans, the data annotated during the development of Lia [19] (see Section 2) was also used. The neural lemmatization model is context sensitive if context is provided, as it is in the NCHLT corpora.…”
Section: Sequence Translationmentioning
confidence: 99%
See 3 more Smart Citations
“…The lemmatization models were trained using the training data from the NCHLT Text Project. For Afrikaans, the data annotated during the development of Lia [19] (see Section 2) was also used. The neural lemmatization model is context sensitive if context is provided, as it is in the NCHLT corpora.…”
Section: Sequence Translationmentioning
confidence: 99%
“…The comparison metrics for compound analysis are accuracy at word-level and F1-score at the compound boundary level. For the lemmatization task, the baseline for Afrikaans is Lia (see Section 2) [19] and precision, recall, F1 and accuracy are reported. Tokens consisting of punctuation or numbers were excluded from evaluation, and for systems trained using the Lia data, all tokens were lowercased before prediction, since Lia does not predict capitalization.…”
Section: Sequence Translationmentioning
confidence: 99%
See 2 more Smart Citations