Large Vocabulary Read Speech Corpora for Four Ethiopian Languages: Amharic, Tigrigna, Oromo, and Wolaytta

Abate, Solomon Teferra; Tachbelie, Martha Yifiru; Melese, Michael; Abera, Hafte; Gebreselassie, Tewodros; Mulugeta, Wondwossen; Assabie, Yaregal; Beyene, Million Meshesha; Atinafu, Solomon; Seyoum, Binyam Ephrem

doi:10.18653/v1/2020.winlp-1.5

Cited by 8 publications

(6 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If the homogeneity assumption is proven, the researcher can proceed to the advanced data analysis stage. In this homogeneity test, IBM SPSS Statistics 22 is used (Abate et al, 2020).…”

Section: Homogeneity Testmentioning

confidence: 99%

The Effectiveness of Animation Media on The Language Skills of Class V Students

Sidabutar,

Manihuruk,

Adelberth

2024

IJMABER

View full text Add to dashboard Cite

This research aims to see the effectiveness of animation media on the language skills of class V students at SDN 091537 Pematangsiantar. This research will be carried out in class V at SD Negeri 091537 Hutabayu which is located at Hutabayu Village, Hutabayuraja District, Simalungun Regency. This research was carried out in the odd semester of the 2023/2024 academic year. The population in this study were all fifth grade students at SD Negeri 091537 Hutabayu. In this study, the population was all fifth grade students at SD Negeri 091537 with a total of 25 students. Data analysis techniques in quantitative research use statistics, namely parametric statistics and the data analyzed is in the form of a ratio scale or interval scale. Data is taken from a normally distributed population. This research discusses whether audio-visual media is effective in improving students' speaking abilities. In this research, an experimental method was used using an initial test (pretest) and final text (posttest). From the t-test paired samples test above, a t-value of 14.713 can be obtained with a significance level of 0.000. Because the significant probability is much smaller than 0.05, namely 0.000 and tcount is 14.713 > ttable = 2.05183, then Ho is rejected. H1 is accepted. This shows that there is effectiveness of animation media on the language skills of class V students at SDN 091537 Pematangsiantar.

show abstract

“…If the homogeneity assumption is proven, the researcher can proceed to the advanced data analysis stage. In this homogeneity test, IBM SPSS Statistics 22 is used (Abate et al, 2020).…”

Section: Homogeneity Testmentioning

confidence: 99%

The Effectiveness of Animation Media on The Language Skills of Class V Students

Sidabutar,

Manihuruk,

Adelberth

2024

IJMABER

View full text Add to dashboard Cite

show abstract

“…The most widely spoken language in Ethiopia is Afaan Oromo, which has a 33.8% speaker followed by a 29.3% speaker for Amharic [19]. Afaan Oromo belongs to the Cushitic language family group of the Afroasiatic language family, while Amharic is a member of the Semitic language family group [20]. After Arabic and Hausa, Afaan Oromo is the third most frequently used Afro-Asiatic language in the world [21].…”

Section: Introductionmentioning

confidence: 99%

Bilingual Hate Speech Detection on Social Media : Amharic and Afaan Oromo

Ababu,

Woldeyohannis,

Getaneh

2023

Preprint

View full text Add to dashboard Cite

Due to significant increases in internet penetration and the development of smartphone technology during the preceding couple of decades, many people have started using social media as a communication platform. Social media has grown to be one of the most significant components, with several benefits. However, technology also poses a number of threats, challenges, and barriers, such as hate speech, disinformation, and fake news. Hate speech detection is one of the many ways social media platforms can be accused of not doing enough to thwart hate speech on their platform. People in Bilingual and multinational societies commonly employ a code mix in both spoken and written communication. Among these, Amharic and Afaan Oromo language speakers frequently mix the two languages when conversing and posting on social media. The majority of previous study concentrated on identifying either technological favoured language or monolingual hate speech in Ethiopian languages; however, the availability of Bilingual communication in social media hampers the propagation of hate speech via social media. In this work, a Bilingual hate speech detection for Amharic and Afaan Oromo languages were conducted using four different deep learning classifiers (CNN, BiLSTM, CNN-BiLSTM, and BiGRU) and three feature extraction (Keras word embedding, word2vec, and FastText) techniques. According to the experiment, BiLSTM with FastText feature extraction is an outperforming the other algorithm by achieving a 78.05\% percent of accuracy for Bilingual Amharic Afaan Oromo hate speech detection. The FastText feature extraction overcomes the problem of out of vocabulary (OOV). Furthermore, we are working towards including others linguistic features of the languages to detect hate speech and make the resource available to facilitate further research in the area of Bilingual hate speech detection for other under-resourced Ethiopian languages.

show abstract

“…In the literature, different studies have been carried out to solve the problem of a large vocabulary. First of all, the creation of a corpus with a large vocabulary was studied and ASR systems with a large vocabulary were developed [15,16]. However, a balanced Turkish dataset of spontaneous conversations and conversations in different fields is not currently available.…”

Section: Introductionmentioning

confidence: 99%

Development of Test Corpus With Large Vocabulary for Turkish Speech Recognition System and a New Test Procedure

Oyucu

2022

Adıyaman Üniversitesi Mühendislik Bilimleri Dergisi

View full text Add to dashboard Cite

The most fundamental problem in the automatic speech recognition systems is not the development of a domainspecific automatic speech recognition system, but the development of an automatic speech recognition system with a large vocabulary. Developed automatic speech recognition systems should be tested with a large vocabulary test dataset. For this reason, an automatic speech recognition test corpus was prepared within the scope of the study. Prepared automatic speech recognition test corpus includes conversations from 20 different areas and text files of these conversations. The test procedure presented in the study was also tested on Turkish automatic speech recognition systems with a large vocabulary. It has been observed that the word error rate results ranged between 14-21%. The test corpus and test procedure with a large vocabulary prepared are guiding for the success of automatic speech recognition systems in future studies to be revealed more clearly.

show abstract

Large Vocabulary Read Speech Corpora for Four Ethiopian Languages: Amharic, Tigrigna, Oromo, and Wolaytta

Cited by 8 publications

References 7 publications

The Effectiveness of Animation Media on The Language Skills of Class V Students

The Effectiveness of Animation Media on The Language Skills of Class V Students

Bilingual Hate Speech Detection on Social Media : Amharic and Afaan Oromo

Development of Test Corpus With Large Vocabulary for Turkish Speech Recognition System and a New Test Procedure

Contact Info

Product

Resources

About