Multitask learning (MTL) is helpful for improving the performance of related tasks when the training dataset is limited and sparse, especially for low-resource languages. Amharic is a lowresource language and suffers from the problems of training data scarcity, sparsity, and unevenness. Consequently, fundamental acoustic units-based speech recognizers perform worse compared with the speech recognizers of technologically favored languages. This paper presents the results of our contributions to the use of various hybrid acoustic modeling units for the Amharic language. The fundamental acoustic units, namely, syllable, phone, and rounded phone units-based deep neural network (DNN) models have been developed. Various hybrid acoustic units have been investigated by jointly training the fundamental acoustic units via the MTL technique. Those hybrid units and the fundamental units are discussed and compared. The experimental results demonstrate that all the fundamental units-based DNN models outperform the Gaussian mixture models (GMM) with relative performance improvements of 14.14%-23.31%. All the hybrid units outperform the fundamental acoustic units with relative performance improvements of 1.33%-4.27%. The syllable and phone units exhibit higher performance under sufficient and limited training datasets, respectively. All the hybrid units are useful with both sufficient and limited training datasets and outperformed the fundamental units. Overall, our results show that DNN is an effective acoustic modeling technique for the Amharic language. The context-dependent (CD) syllable is the more suitable unit if a sufficient training corpus is available and the accuracy of the recognizer is prioritized. The CD phone is a superior unit if the available training dataset is limited and realizes the highest accuracy and fast recognition speed. The hybrid acoustic units perform the best under both sufficient and limited training datasets and achieve the highest accuracy. INDEX TERMS Acoustic modeling units, Amharic, hybrid acoustic modeling units, low-resource language, multitask learning.
Text summarization is a process of producing a concise version of text (summary) from one or more information sources. If the generated summary preserves meaning of the original text, it will help the users to make fast and effective decision. However, how much meaning of the source text can be preserved is becoming harder to evaluate. The most commonly used automatic evaluation metrics like Recall-Oriented Understudy for Gisting Evaluation (ROUGE) strictly rely on the overlapping n-gram units between reference and candidate summaries, which are not suitable to measure the quality of abstractive summaries. Another major challenge to evaluate text summarization systems is lack of consistent ideal reference summaries. Studies show that human summarizers can produce variable reference summaries of the same source that can significantly affect automatic evaluation metrics scores of summarization systems. Humans are biased to certain situation while producing summary, even the same person perhaps produces substantially different summaries of the same source at different time. This paper proposes a word embedding based automatic text summarization and evaluation framework, which can successfully determine salient top-n sentences of a source text as a reference summary, and evaluate the quality of systems summaries against it. Extensive experimental results demonstrate that the proposed framework is effective and able to outperform several baseline methods with regard to both text summarization systems and automatic evaluation metrics when tested on a publicly available dataset.
Automatic speech recognition (ASR) is vital for very low-resource languages for mitigating the extinction trouble. Chaha is one of the low-resource languages, which suffers from the problem of resource insufficiency and some of its phonological, morphological, and orthographic features challenge the development and initiatives in the area of ASR. By considering these challenges, this study is the first endeavor, which analyzed the characteristics of the language, prepared speech corpus, and developed different ASR systems. A small 3-hour read speech corpus was prepared and transcribed. Different basic and rounded phone unit-based speech recognizers were explored using multilingual deep neural network (DNN) modeling methods. The experimental results demonstrated that all the basic phone and rounded phone unit-based multilingual models outperformed the corresponding unilingual models with the relative performance improvements of 5.47% to 19.87% and 5.74% to 16.77%, respectively. The rounded phone unit-based multilingual models outperformed the equivalent basic phone unit-based models with relative performance improvements of 0.95% to 4.98%. Overall, we discovered that multilingual DNN modeling methods are profoundly effective to develop Chaha speech recognizers. Both the basic and rounded phone acoustic units are convenient to build Chaha ASR system. However, the rounded phone unit-based models are superior in performance and faster in recognition speed over the corresponding basic phone unit-based models. Hence, the rounded phone units are the most suitable acoustic units to develop Chaha ASR systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.