2021
DOI: 10.3390/info12020052
|View full text |Cite
|
Sign up to set email alerts
|

Text Classification Based on Convolutional Neural Networks and Word Embedding for Low-Resource Languages: Tigrinya

Abstract: This article studies convolutional neural networks for Tigrinya (also referred to as Tigrigna), which is a family of Semitic languages spoken in Eritrea and northern Ethiopia. Tigrinya is a “low-resource” language and is notable in terms of the absence of comprehensive and free data. Furthermore, it is characterized as one of the most semantically and syntactically complex languages in the world, similar to other Semitic languages. To the best of our knowledge, no previous research has been conducted on the st… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
41
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 77 publications
(41 citation statements)
references
References 26 publications
0
41
0
Order By: Relevance
“…Basically, the general framework of the proposed model consists of preprocessing, feature extraction, feature selection, and classification stages. These stages are suggested in all models that classify using text analysis [41]. The stages of the proposed methodology are shown in Figure 1.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Basically, the general framework of the proposed model consists of preprocessing, feature extraction, feature selection, and classification stages. These stages are suggested in all models that classify using text analysis [41]. The stages of the proposed methodology are shown in Figure 1.…”
Section: Methodsmentioning
confidence: 99%
“…Statistical methods ignore the semantic relationships between words, and they have high computational requirements due to their direct processing of word frequencies. However, they are highly suitable for comparing the performance of classification models [41]. Since the proposed methodology focuses on the performance of different machine learning algorithms, we have chosen the statistical methods described below for our study.…”
Section: Feature Extractionmentioning
confidence: 99%
“…The authors in [ 12 ] proposed text classification based on CNN and word embedding for low-resource languages (Tigrinya). They used manually annotated data sets of 30,000 Tigrinya news texts from various sources, which are divided into six categories: sport, agriculture, politics, religion, education, and health.…”
Section: Related Workmentioning
confidence: 99%
“…Similar to other neural networks, CNN operates in the same way the brain's visual cortex recognizes and processes things and learns to classify them [63]. CNN has also been applied to speech recognition [64][65][66] and natural language processing (NLP) [67,68].…”
Section: Convolutional Neural Network (Cnn)mentioning
confidence: 99%