2008
DOI: 10.4102/lit.v29i1.99
|View full text |Cite
|
Sign up to set email alerts
|

Die ontwikkeling van ’n woordafbreker en kompositumanaliseerder vir Afrikaans

Abstract: The development of a hyphenator and compound analyser for Afrikaans The development of two core-technologies for Afrikaans, viz. a hyphenator and a compound analyser is described in this article. As no annotated Afrikaans data existed prior to this project to serve as training data for a machine learning classifier, the core-technologies in question are first developed using a rule-based approach. The rule-based hyphenator and compound analyser are evaluated and the hyphenator obtains an fscore of 90,84%, whil… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(9 citation statements)
references
References 0 publications
0
9
0
Order By: Relevance
“…The input is a sequence of characters and the target output is a sequence of characters interpolated with compound boundaries (+) and valence morpheme boundaries (_) as seen in Table 3. This format is similar to that described in [22], with the exception that the sequence is space separated and is appended with an end-of-sequence (.) marker, which preliminary experiments indicated improved the accuracy of sequence length prediction.…”
Section: Sequence Translationmentioning
confidence: 99%
See 3 more Smart Citations
“…The input is a sequence of characters and the target output is a sequence of characters interpolated with compound boundaries (+) and valence morpheme boundaries (_) as seen in Table 3. This format is similar to that described in [22], with the exception that the sequence is space separated and is appended with an end-of-sequence (.) marker, which preliminary experiments indicated improved the accuracy of sequence length prediction.…”
Section: Sequence Translationmentioning
confidence: 99%
“…For compound analysis, results are compared with those reported in [22] as a baseline, which were achieved using a memory-based learner. They are additionally compared with those achieved by CatBoost, a gradient-boosted decision tree (GBDT) implementation [68], which was found to be the best-performing machine learning algorithm of those evaluated by the authors (machine learning methods tested included kNN's (IBk), non-boosted decision trees, Random Forest, SVM's, One Rule, and Naïve Bayes).…”
Section: Sequence Translationmentioning
confidence: 99%
See 2 more Smart Citations
“…The manual is based on the annotation guidelines that were developed during the CKarma project (CTexT, 2005;Pilon et al, 2008). These initial guidelines only apply to Afrikaans, and was hence extended to handle Dutch compounds as well as more complicated cases not foreseen in the original CKarma guidelines.…”
Section: Dataset Developmentmentioning
confidence: 99%