The development of a hyphenator and compound analyser for Afrikaans The development of two core-technologies for Afrikaans, viz. a hyphenator and a compound analyser is described in this article. As no annotated Afrikaans data existed prior to this project to serve as training data for a machine learning classifier, the core-technologies in question are first developed using a rule-based approach. The rule-based hyphenator and compound analyser are evaluated and the hyphenator obtains an fscore of 90,84%, while the compound analyser only reaches an f-score of 78,20%. Since these results are somewhat disappointing and/or insufficient for practical implementation, it was decided that a machine learning technique (memory-based learning) will be used instead. Training data for each of the two core-technologies is then developed using “TurboAnnotate”, an interface designed to improve the accuracy and speed of manual annotation. The hyphenator developed using machine learning has been trained with 39 943 words and reaches an fscore of 98,11% while the f-score of the compound analyser is 90,57% after being trained with 77 589 annotated words. It is concluded that machine learning (specifically memory-based learning) seems an appropriate approach for developing coretechnologies for Afrikaans.
In 'n tweetal artikels (hierdie artikel en Breed & Van Huyssteen 2014) stel ons ondersoek in na die wyse waarop twee perifrastiese konstruksies, te wete die VKOP besig om te V- en die VKOP aan die V-konstruksies, gebruik word om progressiewe betekenis in Afrikaans uit te druk. Die bespreking word gebaseer op 'n korpusondersoek waarin drie Afrikaanse perifrastiese progressiewe konstruksies (d.i. die twee genoemde konstruksies, sowel as die VPOS en V-konstruksie; sien Breed 2012 en Breed & Brisard 2015) met mekaar vergelyk is. Drie ewekniekonstruksies word in Nederlands gevind, en daarom word die konstruksies ook, waar ter sake, met die Nederlandse konstruksies vergelyk. In Breed (2012) word die grammatikaliseringsproses verduidelik, en daar word aangetoon dat die aan die-/aan het-konstruksies lokatiewe oorsprong het, terwyl die oorsprong van die besig-/bezig-konstruksies leksikaal gemotiveerd is. In hierdie artikel word die frekwensie van die twee konstruksies ondersoek en daar word bewys dat daar aansienlike verskille tussen die verspreiding van die Afrikaanse en Nederlandse konstruksies is. Die werkwoorde waarmee elk van die konstruksies prototipies kombineer, word bestudeer en daar word bevind dat die Afrikaanse besig-konstruksie met 'n groot aantal werkwoordtipes kan kombineer, terwyl die aan die-progressiefkonstruksie meer gespesialiseerd is.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.