2008
DOI: 10.4102/lit.v29i1.103
|View full text |Cite
|
Sign up to set email alerts
|

On the development of a tagset for Northern Sotho with special reference to the issue of standardisation

Abstract: Working with corpora in the South African Bantu languages has up till now been limited to the utilisation of raw corpora. Such corpora, however, have limited functionality. Thus the next logical step in any NLP application is the development of software for automatic tagging of electronic texts. The development of a tagset is one of the first steps in corpus annotation. The authors of this article argue that the design of a tagset cannot be isolated from the purpose of the tagset, or from the place of the tags… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2009
2009
2021
2021

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 7 publications
0
6
0
Order By: Relevance
“…3. The parts of speech utilized in this paper were described in Taljard, Faaß, Heid and Prinsloo (2008); they are assigned to orthographic tokens. 4.…”
Section: Notesmentioning
confidence: 99%
“…3. The parts of speech utilized in this paper were described in Taljard, Faaß, Heid and Prinsloo (2008); they are assigned to orthographic tokens. 4.…”
Section: Notesmentioning
confidence: 99%
“…Other Bantu tag-set (SWATWOL -Arvi Hurskainen, 2004 [2], Northern Sotho tag-set -Taljard E. et al, 2008 [3]; Gertrud Faab et al, 2009 [4], among other) are language specific and consequently, unsuitable for this purpose. The Multilingual Morpho-syntactic Specifications (Erjavec, 2004 [5]; 2009 [6]; 2010 [7]) are not exhaustive leaving out a lot of essential morphosyntactic properties of Luganda which is important in the grammar analysis.…”
Section: Introductionmentioning
confidence: 99%
“…However, we have endeavoured to compare SCTL with tag-sets of other inflectional languages, namely, a positional tag-set -Russian positional tag-set, Jirka Hana and Feldman Anna, 2010 [14] -and two atomic tag-sets for Bantu languages which uses a two level tagging process (that is, Swahili Tag-set of SWATWOL -Arvi Hurskainen, 2004 [2] and Northern Sotho tag-set -Taljard E. et al, 2008 [3]; Gertrud Faa et al, 2009 [4]). Table 17 shows a qualitative comparison between these tag-sets articulating the general differences between them.…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, an infinitive constellation is first and foremost to be defined as a verbal phrase no matter if it occurs in a nominal or a verbal role, because it is headed by a verb. In Northern Sotho, so far, no morphological analyser is available that would be capable of identifying linguistic words [Kotzé (2008) describes the respective challenges] therefore, Taljard, Faaß, Heid and Prinsloo (2008) designed a set of 'parts of speech' 5 which describes orthographic tokens independently of their word status; all parser rules to describe infinitive constellations in this article are based on these units and on the head principle.…”
Section: Overviewmentioning
confidence: 99%
“…The parts of speech utilized in this paper were described in Taljard, Faaß, Heid and Prinsloo (2008); they are assigned to orthographic tokens.…”
Section: Notesmentioning
confidence: 99%