Zeyad Hamid scite author profile

2021

J. Phys.: Conf. Ser.

In this paper, a new technique has been suggested for extracting textual maximal frequent itemsets named Maximal Itemset Miner Algorithm (MIMA). This algorithm begins search process through generating the best initial border in search space depending on minimum support of items in the first level that achieves the general minimum support determined by the user. Our approach for counting itemsets support combines the idea of vertical representation of the data with a queue data structure to store the itemsets. To reduce search space, the algorithm adopted several pruning conditions for each itemsets in the initial border. Experiments performed on standard textual CNN Arabic dataset and proposed method registers less execution time comparing with the Apriori algorithm when applying it on three different size datasets.

show abstract

A General Algorithm of Association Rule-Based Machine Learning Dedicated for Text Classification

2021

J. Phys.: Conf. Ser.

Many data mining techniques and machine learning algorithms have been developed to classify textual data involving decision tree, support vector machine, K-Nearest neighbour, in addition to machine learning-based algorithms. Association rules based machine learning is accomplished in two phases; training phase and testing phase that may be reinforced to enhance the classification accuracy according to new minimum support and confidence. Association rules mining/processing, in its various applications, passes through two massive computation steps; frequent itemsets mining and association rules extraction. This paper presents a general algorithm for association rules-based machine learning dedicated to text classification. To verify the efficiency of the algorithm, different text datasets were used such as tweets dataset for sentiment classification, pdf documents and HTML documents. Experiments of sentiment classification showed that the classifier constructed according to minsup threshold =%700 and minconf threshold =50% gives the best performance with F1 = 0.9861811 while the experiments of HTML and PDF appeared accurate classification equal to (94%).

show abstract

Classification of Arabic Documents depending on Maximal Frequent Itemsets

2021

J. Phys.: Conf. Ser.

In this paper we introduced techniques for classifying Arabic documents depending on association rules built from maximal frequent itemsets. Parallel Maximal Itemset Miner Algorithm (PMIMA) adopted several conditions to prune search space parallelly introduced for extracting maximal frequent itemsets. Rule length, rule weight and rule majority are three classification methods exploited to classification Arabic documents. Comparing with classification results obtained depending on all frequent itemsets extracted by Apriori, we proved efficiency of ours approach.

show abstract

A new stemming algorithm dedicated for Arabic documents Classification

2020