Diacritic Restoration is a necessity in the processing of languages with Latinbased scripts that utilizes characters outside the basic Latin alphabet used by the English language. Yorùbá is one such language, marking an underdot (dotbelow) on three characters and tone marks on all seven vowels and two syllabic nasals. The problem of restoring underdotted characters has been fairly addressed using characters as linguistic units for restoration. However, the existing character-based approaches and word-based approach has not been able to sufficiently address the restoration of tone marks in Yorùbá. In this study, we address tone-mark restoration as a subset of diacritic restoration. We proposed using syllables derived from words as linguistic tokens for tone-mark restoration. In our experimental setup, we used Yorùbá text collected from various sources as data with a total word count of 250,336 words. These words, on syllabification, yielded 464,274 syllables. The syllables were divided into training and testing data in different proportions, ranging from 99% used for training and 1% used for testing to 70% used for training and 30% used for testing. The aim of evaluating the different proportions was to determine how the ratio of training-to-test data affected the variations that may occur in the result. We applied memory-based learning to train the models. We also set up a similar experiment using a character token to be able to compare the performance. The result showed that ,by using syllables, we were able to increase the wordlevel accuracy to 96.23% (an average of almost 15% over using characters). We also found that using 75% of the data for training and the remaining 25% for testing gives results with the least variation in a ten-fold cross validation test. Hybridizing this method that uses syllabless as processing linguistic units with other methods like lexicon lookup might likely lead to improvement over the current result.
A diacritic is a mark placed near or through a character to alter its original phonetic or orthographic value. Many languages around the world use diacritics in their orthography, whatever the writing system the orthography is based on. In many languages, diacritics are ignored either by convention or as a matter of convenience. For users who are not familiar with the text domain, the absence of diacritics within text has been known to cause mild to serious readability and comprehension problems. However, the absence of diacritics in text causes near-intractable problems for natural language processing systems. This situation has led to extensive research on diacritization. Several techniques have been applied to address diacritic restoration (or diacritization) but the existing surveys of techniques have been restricted to some languages and hence left gaps for practitioners to fill. Our survey examined diacritization from the angle of resources deployed and various formulation employed for diacritization. It was concluded by recommending that (a) any proposed technique for diacritization should consider the language features and the purpose served by diacritics, (b) that evaluation metrics needed to be more rigorously defined for easy comparison of performance of models.
Rapid industrialization has contributed immensely to the discharge of heavy metals into receiving water bodies untreated. The quantity of heavy metals prediction in industrial wastewater is very essential before treatment so that the quantity is precisely removed. This article formulates, simulate and evaluate a predictive model that mimics electrochemical treatment of lead and cadmium ions present in paint industrial wastewater using artificial neural network. The predictive model was formulated using Fuzzy Logic toolbox in MATLAB and the simulation was done in the environment. The prediction of the model was evaluated by comparing the predicted quantity of lead ions and cadmium ions with the result of the experimental work in the laboratory. The article concludes that the developed prediction model demonstrated very high prediction accuracy in predicting the percentage of lead and cadmium ions present in paints wastewater.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.