South Africa has eleven official languages. However, not all have received similar amounts of attention. In particular, for many of the languages, only a limited number of digital language resources (data sets and computational tools) exist. This scarcity hinders (computational) research in the fields of humanities and social sciences for these languages. Additionally, using existing computational linguistics tools in a practical setting requires expert knowledge on the usage of these tools. In South Africa, only a small number of people currently have this expertise, further limiting the type of research that relies on computational linguistic tools. The South African Centre for Digital Language Resources (SADiLaR) aims to enable and enhance research in the area of language technology by focusing on the development, management, and distribution of digital language resources for all South African languages. Additionally, it aims to build research capacity, specifically in the field of digital humanities. This requires several challenges to be resolved that we cluster under resources, training, and community building. SADiLaR hosts a repository of existing digital language resources and supports the development of new resources. Additionally, it provides training on the use of these resources, specifically for (but not limited to) researchers in the fields of humanities and social sciences. Through this training, SADiLaR tries to build a community of practice to boost information sharing in the area of digital humanities.
This article gives a perspective on Sesotho lexicography and a critical analysis of the macrostructures and microstructures of three selected Sesotho dictionaries. The monolingual paper dictionary Sethantšo sa Sesotho, the bilingual paper dictionary Southern Sotho-English Dictionary and the Sesotho online Bukantswe v.3 are evaluated. Their virtues and shortcomings as reference works will be viewed against dictionaries of high lexicographic achievement in order to establish to what extent they fulfil the most basic requirements of macrostructures and microstructures. The inconsistencies addressed in this article reflect the need for Sesotho lexicographers to use corpora in dictionary compilation in order to enhance the quality of entries on both microstructural and macrostructural levels. It will be argued that much more research and description of lexicographic issues is required to bring Sesotho lexicography on a par with its sister languages, Sepedi and Setswana and with good dictionaries for major languages of the world. After decades in existence, currently available Sesotho dictionaries are in dire need for revision and new dictionaries aimed at specific target users should be compiled.
This article overviews digital language resources available for Sesotho, an official language of South Africa. The South African Center for Digital Language Resources (SADiLaR) repository is used as a reference as it is the official host of various language resources for South African languages. A total of 18 written resources are identified from the repository, and a further 16 spoken resources are identified. Finally, a total of 45 applications and modules were identified. Findings indicate that the majority of applications and modules available for Sesotho are in fact general resources aimed at all eleven official South African languages. Furthermore, the available resources indicate an inclination to the development of entry level, basic language resources and an absence of middle and higher resources with functionalities such as semantic analyses for written resources and prosody prediction for spoken resources. The study is hindered by the dearth of resource specific evaluations and related research and exacerbated by the absence of some of the resources on the repository.
In this article the existence, use and importance of repositories are explored. An introduction into language resources (LRs) is given as well as a discussion of two platforms for the distribution of language resources, namely, the repository of the South African Centre for Digital Language Resources (SADiLaR) and Lanfrica, a site that links resources. In this article, types of repositories, such as institutional and language resource repositories, will be distinguished and compared. Language preservation is proposed as an important aspect which can be strengthened by the presence and use of repositories. The view expressed in this article is that the availability of language resources and repositories are pivotal for the development, preservation and advancement of languages. Having a host site that links available resources and a repository where resources could be uploaded is a positive attribute of the mentioned online platforms, however as it will be discussed, the fact that information is available online is not a guarantee that the resources are or will be used by researchers or other interested persons, especially if they are not aware of their existence. The article is concluded with suggestions for future work, for example measuring the influence of inaccurate metadata of language resources on linguistic research.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.