Abstract:In this article it is shown how a corpus-based dictionary grammar may be compiled -that is, a mini-grammar fully based on corpus data and specifically written for use in and integrated with a dictionary. Such an effort is, to the best of our knowledge, a world's first. We exemplify our approach for a Northern Sotho mini-grammar, to be included into a Northern Sotho-English dictionary. Keywords: LEXICOGRAPHY, DICTIONARY, CORPUS, FREQUENCY, MIDDLE MATTER, DICTIONARY GRAMMAR, NORTHERN SOTHO (SESOTHO SA LEBOA)Samenvatting: Een corpusgebaseerde woordenboekgrammatica samenstellen: een voorbeeld voor Noord-Sotho. In dit artikel wordt aangetoond hoe een corpusgebaseerde woordenboekgrammatica kan samengesteld worden -dit is, een minigrammatica die al z'n gegevens rechtstreeks uit een corpus haalt en die speciaal geschreven werd om in een woordenboek gebruikt te worden, en er ook volledig mee geïntegreerd is. Zo'n poging is, voor zover ons bekend, een wereldprimeur. We illustreren onze aanpak voor een minigrammatica van het NoordSotho, bedoeld om gebruikt te worden in een Noord-Sotho-Engels woordenboek. Sleutelwoorden: LEXICOGRAFIE, WOORDENBOEK, CORPUS, FREQUENTIE, MID-DENWERK, WOORDENBOEKGRAMMATICA, NOORD-SOTHO Using corpora beyond a dictionary's central section(s)It is now widely accepted that the use of electronic corpora has become indispensable in modern dictionary making, and this on a variety of levels. But just on how many levels? The macrostructural and microstructural levels immediately spring to mind, and most attention in the scientific literature has indeed also gone to aspects revolving around the corpus-based selection of lemma signs on the one hand, and the corpus-based construction of articles attached to those lemma signs on the other. Any self-respecting dictionary, however, contains much more than 'just' the central text. Good dictionaries also comprise extra matter, invariably distributed across front, middle and back matter sections. If one is serious about corpus-based lexicography, then the extra matter should also be rooted in corpus data. One can come a long way by making sure there is a one-to-one correlation between the central (corpus-based) section(s) and the extra matter (cf. below), but during practical dictionary making this quickly proves not to be sufficient. In this article the focus will be on the creation of a corpus-based dictionary grammar, exemplified for Northern Sotho. The core principles of corpus-based lexicography will be briefly reviewed in order to set the stage, but that review is merely incidental and the reader is referred to Sinclair (1987) and Corréard (2002) for what remain to this day the best collections on the topic. Corpus-based lexicography in a nutshellIn corpus-based lexicography, the main arbiter during the creation of the (initial) macrostructure is the list of frequencies attached to the lemmatised list of inclusion candidates. Clearly, there are as many lemmatisation policies as there are dictionary teams compiling dictionaries, but it remains comm...
Abstract:In this research article an in-depth investigation is presented of the lexicographic treatment of the demonstrative copulative (DC) in Sesotho sa Leboa. This one case study serves as an example to illustrate the so-called 'paradigmatic lemmatisation' of closed-class words in the African languages. The need for such an approach follows a discussion, in Sections 1 and 2 respectively, of the present and missing directions in African-language metalexicography. A theoretical conspectus of the DC in Sesotho sa Leboa is then offered in Section 3, while Section 4 examines the treatment of the DC in the four existing desktop dictionaries for this language. The outcomes from the two latter sections are then used in Section 5, which analyses the problems of and options for a sound lexicographic treatment of the DC in bilingual and monolingual dictionaries. The next two sections proceed with a review of the practical implementation of the DC lemmatisation suggestions in PyaSsaL, i.e. the Pukuntšutlhaloši ya Sesotho sa Leboa 'Explanatory Sesotho sa Leboa Dictionary' -with Section 6 focussing on the hardcopy and Section 7 on the online version. In the process, the very first fully monolingual African-language dictionary on the Internet is introduced. Section 8, finally, concludes briefly.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.