The advancement in Information Technology (IT) has assisted in inculcating the three Nigeria major languages in text-based application such as text mining, information retrieval and natural language processing. The interest of this paper is the Igbo language, which uses compounding as a common type of word formation and as well has many vocabularies of compound words. The issues of collocation, word ordering and compounding play high role in Igbo language. The ambiguity in dealing with these compound words has made the representation of Igbo language text document very difficult because this cannot be addressed using the most common and standard approach of the Bag-Of-Words (BOW) model of text representation, which ignores the word order and relation. However, this cause for a concern and the need to develop an improved model to capture this situation. This paper presents the analysis of Igbo language text document, considering its compounding nature and describes its representation with the Word-based N-gram model to properly prepare it for any text-based application. The result shows that Bigram and Trigram n-gram text representation models provide more semantic information as well addresses the issues of compounding, word ordering and collocations which are the major language peculiarities in Igbo. They are likely to give better performance when used in any Igbo textbased system.
This study is based on the premise that it is possible to train computers to predict the language of a word (textual or audio) by learning from its character n‐gram pattern, without recourse to the language's dictionary. With the growth of multilingual collections and a need for automatic means of cleaning textual datasets, this paper presents a strategy for language identification of individual words in a body of texts. This strategy is suitable for resource‐scarce languages that do not have large electronic datasets that are required for machine learning and natural language processing studies and whose dictionaries may not be available. In this study, we focused on three African languages, namely Hausa, Igbo, and Yoruba. A training corpus in each of these languages was used to obtain the probabilities of character trigrams in the language. Given that English is a common language that is often mixed with these resource‐scarce languages in texts, we also obtained the probabilities of trigrams in an English training corpus. These probabilities were then used in identifying the language of each word in test corpora containing bilingual texts. Our strategy achieved average precision, recall and F1 values of about 97%, 91% and 94% respectively.
This article engages song melody and speech tone conflict in translated Yorùbá Christian hymns between the late 19th and early 20th century. In their effort to make early Yorùbá Christian converts sing Christian hymns in the church, European missionaries translated English hymns to Yoruba, and sang them to the original European hymn tunes. Yorùbá being a tone language, requires a significant level of correlation between song melody and speech tone, for the words to retain their original meaning when sung. The tripartite constraint of aligning melody, meter, as well as meaning, posed a major problem to the hymn translators. Having given priority to melody and metre, the translators therefore, tend to compromise on meaning, thereby producing Yorùbá hymns that will sound interesting melodically, and correlate metrically with the metre, but producing hardly meaningful words when sung. This study utilized samples from Iwe Orin Mimo, being the Yorùbá translation of a range of hymns in Hymnal Companion, Hymns Ancient and Modern, and some other hymn books popularly used by the Church Missionary Society (CMS). The work presents a graphical illustration of the disparity between the hymn tunes and the speech tone of the Yorùbá language. It also highlights the efforts of Indigenous composers in correcting the perceived error through re-composition of the first stanza of selected hymns, to which they wrote more stanzas that align with the theme of the first stanza. The inappropriately translated Yorùbá hymn books have remained strong institutions within the church and have therefore, continued to promote the use of the translated hymns in the Yoruba church.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.