South African languages (and indigenous African languages in general) lag behind other languages in terms of the availability of linguistic resources. Efforts to improve or fasttrack the development of linguistic resources are required to bridge this ever-increasing gap. In this paper we emphasize the advantages of technology transfer between two languages to advance an existing linguistic technology/resource. The advantages of technology transfer are illustrated by showing how an existing lemmatiser for Setswana can be improved by applying a methodology that was first used in the development of a lemmatiser for Afrikaans.
This paper describes the development of a memory-based lemmatiser for Afrikaans called Lia. The paper commences with a brief overview of Afrikaans lemmatisation and it is indicated that lemmatisation is seen as a simplified process of morphological analysis within the context of this paper. This overview is followed by an introduction to memory-based learning -the machine learning technique that is used in the development of the Afrikaans lemmatiser. The deployment of Lia is then discussed with specific emphasis on the format of the training and testing data that is used. The Afrikaans lemmatiser is then evaluated and it is indicated that Lia achieves a linguistic accuracy figure of over 90%. The paper concludes with some ideas on future work that can be done to improve the linguistic accuracy of the Afrikaans lemmatiser.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.