Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop 2014
DOI: 10.3115/v1/w14-4914
|View full text |Cite
|
Sign up to set email alerts
|

Part-of-speech Tagset and Corpus Development for Igbo, an African Language

Abstract: This project aims to develop linguistic resources to support computational NLP research on the Igbo language. The starting point for this project is the development of a new part-of-speech tagging scheme based on the EAGLES tagset guidelines, adapted to incorporate additional language internal features. The tags are currently being used in a part-of-speech annotation task for the development of POS tagged Igbo corpus. The proposed tagset has 59 tags.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
9
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
2
2

Relationship

3
5

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 4 publications
0
9
0
Order By: Relevance
“…Igbo Tagset tags description and usage. See [21], [22], [23] for the full description of Igbo tagset.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Igbo Tagset tags description and usage. See [21], [22], [23] for the full description of Igbo tagset.…”
Section: Discussionmentioning
confidence: 99%
“…New World Translation (NWT) Bible [17] provides an ideal test case for our study because of the existence of publically accessible texts of English and their translations in electronic format available in Igbo that is already POS annotated in [23]. The former allows us to use existing English POS tagger on the English texts and transfer POS tags via alignment and projection onto Igbo texts; the latter allows us to evaluate the tagged corpus on sizeable human-annotated tags.…”
Section: Datamentioning
confidence: 99%
“…The preprocessing task relied substantially on the approaches used by Onyenwe et al (2014). Observed language based patterns were preserved.…”
Section: Preprocessingmentioning
confidence: 99%
“…The 28 consonant characters are "b, ch, d, f, g, gb, gh, gw, h, j, k, kw, kp, l, m, n, nw, ny, ṅ, p, r, s, sh, t, v, w, y, z" and 8 vowels characters are "a, e, i, ị, o, ọ, u, ụ". There are nine consonants characters that are digraphs: "ch, gb, gh, gw, kp, kw, nw, ny, sh" [6]. It uses a Roman Script and it is a tonal language with two distinct tones, high and low.…”
Section: Igbo Languagementioning
confidence: 99%