2019
DOI: 10.1109/access.2019.2903331
|View full text |Cite
|
Sign up to set email alerts
|

A Study of the Effects of Stemming Strategies on Arabic Document Classification

Abstract: Stemming is one of the most effective techniques, which has been adopted in many applications, such as machine learning, machine translation, document classification (DC), information retrieval, and natural language processing. The stemming technique is meant to be applied during the classification of documents to reduce the high dimensionality of the feature space, which, in turn, raises the functioning of the classification system, particularly with extreme modulated language, for instance, Arabic language. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
41
0
2

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 64 publications
(43 citation statements)
references
References 34 publications
0
41
0
2
Order By: Relevance
“…In addition to evaluating the accuracy of these classifiers correctly according to Sahih al-Bukhari book, which Muslims view as one of two most trusted books of hadith along with Sahih Muslim, https://www.islamicfinder.org/hadith/bukhari/ These hadiths have been collected and classified manually, therefore there is an urgency to reclassify them automatically using automated learning techniques and compared the results of the automatic classification to manual classification. Hadiths classification process consists of four main stages: the first one is text pre-processing [27]. Then the term weighting is used, known as text representation, term weighting was applied in this study using two approaches, the first is the Boolean algebra and the second is term frequencyinverse document frequency (TF-IDF).…”
Section: Introductionmentioning
confidence: 99%
“…In addition to evaluating the accuracy of these classifiers correctly according to Sahih al-Bukhari book, which Muslims view as one of two most trusted books of hadith along with Sahih Muslim, https://www.islamicfinder.org/hadith/bukhari/ These hadiths have been collected and classified manually, therefore there is an urgency to reclassify them automatically using automated learning techniques and compared the results of the automatic classification to manual classification. Hadiths classification process consists of four main stages: the first one is text pre-processing [27]. Then the term weighting is used, known as text representation, term weighting was applied in this study using two approaches, the first is the Boolean algebra and the second is term frequencyinverse document frequency (TF-IDF).…”
Section: Introductionmentioning
confidence: 99%
“…A further attempt was made to identify document using word2vec in combination with the LDA method [21], which also provided better results. In addition to text classification, word2vec has been used in many other areas of application such as improving medical knowledge through unsupervised medical corporate learning [22], answer selected from possible collection, good, poor in a question -response method [23], etc. Word2Vec is an unsupervised model of writing, writing the semantic context associated with the text.…”
Section: Related Workmentioning
confidence: 99%
“…Empirically, the types of texts used for text categorization tasks can be varied and composed of various languages such as English, Chinese, and other languages [11], [12]. Unlike English text, Chinese text tasks usually require word segmentation because every character in Chinese is connected together [15].…”
Section: Chinese Radiology Text a Descriptionmentioning
confidence: 99%
“…The output gate O t will determine which outputs will be treated as current states. We update the information Z t in the LSTM by (12), and (14) can obtain the output H t of a hidden layer at time step t. Note that W and U are the network weight parameters, and is the ''Hadamard product'' operation.…”
Section: Long Short-term Memory Networkmentioning
confidence: 99%
See 1 more Smart Citation