2017
DOI: 10.4102/lit.v38i1.1351
|View full text |Cite
|
Sign up to set email alerts
|

Strategies for building wordnets for under-resourced languages: The case of African languages

Abstract: The African Wordnet Project (AWN) aims at building wordnets for five African languages: Setswana, isiXhosa, isiZulu, Sesotho sa Leboa (also referred to as Sepedi or Northern Sotho) and Tshivenda. Currently, the so-called expand model, based on the structure of the English Princeton WordNet (PWN), is used to continually develop the African Wordnets manually. This is a labour-intensive work that needs to be performed by linguistic experts, guided by several considerations such as the level of lexicalisation of a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 30 publications
(5 citation statements)
references
References 12 publications
0
5
0
Order By: Relevance
“…The use of such a simplification of WordNet's semantic relations significantly reduces the amount of time necessary to semantically classify each word, as only a direct correspondence to the relevant WordNet synset would be necessary for a lexical item in the target language to be considered classified, with first-pass hypernymy and hyponymy relationships constructed indirectly by populating synsets. Using this method, manual classification of dictionary items can provide a basic semantic ontology of the target language at a rate of 400-500 word types daily per annotator, compared with a rate of ~1000 synsets per year reported by Bosch and Griesel during their creation of full WordNets for low-resource South African Bantu languages (Bosch and Griesel, 2017). This skeletal form of WordNet also provides the benefit of requiring substantially less linguistic background knowledge to effectively use, reducing the need for lengthy annotator training sessions.…”
Section: Fundamentals Of Wordnetmentioning
confidence: 99%
See 1 more Smart Citation
“…The use of such a simplification of WordNet's semantic relations significantly reduces the amount of time necessary to semantically classify each word, as only a direct correspondence to the relevant WordNet synset would be necessary for a lexical item in the target language to be considered classified, with first-pass hypernymy and hyponymy relationships constructed indirectly by populating synsets. Using this method, manual classification of dictionary items can provide a basic semantic ontology of the target language at a rate of 400-500 word types daily per annotator, compared with a rate of ~1000 synsets per year reported by Bosch and Griesel during their creation of full WordNets for low-resource South African Bantu languages (Bosch and Griesel, 2017). This skeletal form of WordNet also provides the benefit of requiring substantially less linguistic background knowledge to effectively use, reducing the need for lengthy annotator training sessions.…”
Section: Fundamentals Of Wordnetmentioning
confidence: 99%
“…Although initially developed for English, the WordNet approach for semantic classification has since become a staple in modern lexicography, with WordNets of varying size and complexity existing for many prominent global and national majority languages, such as German with GermaNet (Hamp and Feldweg, 1997;Hinrich and Hinrichs, 2010), Finnish with FinnWordNet (Lindén and Niemi, 2014), and Korean with KorLex (Aesun Yoon et al, 2009), among dozens of others. However, while semantic classifications such as these have become relatively commonplace among prominent majority languages in the developed world, they remain a rarity among underdocumented or otherwise poorly resourced languages (Bosch and Griesel, 2017). Using existing, conventional lexical resources, we provide here a holistic comparison between a manual method in semantic classification using a WordNet-based ontology and an automatic computational method via vector semantics, with respect to the efficacy and precision of both methods.…”
Section: Introductionmentioning
confidence: 99%
“…Expansion approach -existing synsets from a reference WN are used as a guide to create corresponding synsets in a new WN, by gathering applicable words that represent the meaning of the synset. This approach has been shown to be suitable for under resourced languages [13].…”
Section: Wordnetsmentioning
confidence: 99%
“…We highlight a number of WordNets developed based on expansion approach that are relevant to the development of a WordNet for the language of Kenya used in this work. The WordNets include: EuroWordNet developed by linking several European languages to English WordNet [14]; Persian WordNet [15]; Finnish WordNet [15]; Polish WordNet [16] and African WordNet (AWN) [13] created by aligning several languages spoken in Southern Africa. A number of tools were used in development of AWN including DEBVisDic 6 editor tools for linguists building AWN.…”
Section: Wordnetsmentioning
confidence: 99%
See 1 more Smart Citation