A comparison of supervised classification methods for a statistical set of features: Application: Amazigh OCR

Aharrane, Nabil; Moutaouakil, Karim El; Satori, Khalid

doi:10.1109/isacv.2015.7106171

Cited by 11 publications

(10 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In various domains, and notably on the text mining field, the preprocessing process is a set of measures proposed to clean textual data and use numerical representation (Aharrane et al , 2015). As the first step in preprocessing for our work, we transform the text into a sequence of characters for the described data sets.…”

Section: Experimentation and Resultsmentioning

confidence: 99%

“…The ROC curve is often used to determine the optimal threshold in classification problems. This curve represents the evolution of sensitivity (Aharrane et al , 2015) depending on (1- specificity) when we vary the threshold. The area under the ROC curve (AUC) gives a reasonable estimate of the system's rejection capability, i.e.…”

Section: Experimentation and Resultsmentioning

confidence: 99%

“…Besides, a collection of nonlinear kernel functions, the linear SVM can be prolonged to a nonlinear one. In this regard, the data are projected into a space of a high dimension to be separated, following, linearly (Aharrane et al , 2015).…”

Section: Machine Learning Tools For Text Classificationmentioning

confidence: 99%

See 2 more Smart Citations

A new neutrosophic TF-IDF term weighting for text mining tasks: text classification use case

Bounabi

Elmoutaouakil

Satori

2021

IJWIS

Self Cite

View full text Add to dashboard Cite

Purpose This paper aims to present a new term weighting approach for text classification as a text mining task. The original method, neutrosophic term frequency – inverse term frequency (NTF-IDF), is an extended version of the popular fuzzy TF-IDF (FTF-IDF) and uses the neutrosophic reasoning to analyze and generate weights for terms in natural languages. The paper also propose a comparative study between the popular FTF-IDF and NTF-IDF and their impacts on different machine learning (ML) classifiers for document categorization goals. Design/methodology/approach After preprocessing textual data, the original Neutrosophic TF-IDF applies the neutrosophic inference system (NIS) to produce weights for terms representing a document. Using the local frequency TF, global frequency IDF and text N's length as NIS inputs, this study generate two neutrosophic weights for a given term. The first measure provides information on the relevance degree for a word, and the second one represents their ambiguity degree. Next, the Zhang combination function is applied to combine neutrosophic weights outputs and present the final term weight, inserted in the document's representative vector. To analyze the NTF-IDF impact on the classification phase, this study uses a set of ML algorithms. Findings Practicing the neutrosophic logic (NL) characteristics, the authors have been able to study the ambiguity of the terms and their degree of relevance to represent a document. NL's choice has proven its effectiveness in defining significant text vectorization weights, especially for text classification tasks. The experimentation part demonstrates that the new method positively impacts the categorization. Moreover, the adopted system's recognition rate is higher than 91%, an accuracy score not attained using the FTF-IDF. Also, using benchmarked data sets, in different text mining fields, and many ML classifiers, i.e. SVM and Feed-Forward Network, and applying the proposed term scores NTF-IDF improves the accuracy by 10%. Originality/value The novelty of this paper lies in two aspects. First, a new term weighting method, which uses the term frequencies as components to define the relevance and the ambiguity of term; second, the application of NL to infer weights is considered as an original model in this paper, which also aims to correct the shortcomings of the FTF-IDF which uses fuzzy logic and its drawbacks. The introduced technique was combined with different ML models to improve the accuracy and relevance of the obtained feature vectors to fed the classification mechanism.

show abstract

Section: Experimentation and Resultsmentioning

confidence: 99%

Section: Experimentation and Resultsmentioning

confidence: 99%

See 1 more Smart Citation

A new neutrosophic TF-IDF term weighting for text mining tasks: text classification use case

Bounabi

Elmoutaouakil

Satori

2021

IJWIS

Self Cite

View full text Add to dashboard Cite

show abstract

“…Once the embedding matrix generated, we use some Machine learning models to analyze amazon customers reviews and to categorize electronic News according to their subject, like: Support Vector Machine [24] supervised learning systems with related learning algorithms that analyze the data used for classification and regression analysis. The main goals of this algorithm are to locate a hyperplane in the N-dimensional space of the features number that specifically identifies the data points.…”

Section: Machine Learning Classifiersmentioning

confidence: 99%

The Impact of Neural Embedding Characteristics on Text Mining Tasks: Document Classification Use Case

Bounabi¹

2020

IJATCSE

View full text Add to dashboard Cite

One of the relevant text mining tasks is the document classification, where a useful content categorization control in many domains like content analyses, retrieval information, and the recommendation systems. In general, a set of process influence the classification system effectiveness, and the data representation has an essential impact on the text categorization as we will discover in this article. Hence, the paper's goal is to adjust the Paragraph Vector-Distributed Memory (PV-DM) as a variant of the current methods for neural text representation by comparing diverse neural parameters choices control the system complexity, e.g., epoch number, and vector size. Also, we employ a collection of classifiers subsequently combined using majority voting to show the impact of the neural PV-DM embedding on the binary business sentiment analysis, and multi labeled News data classification. The experiments prove that a suitable selection of the neural embedding characteristics enhances the hybrid machine learning model to 99% accuracy for a data type.

show abstract

“…It becomes largely used in different fields where the human machine interaction is a decisive stage. Two kind approaches are proposed to describe the face image: the global methods which use the totality of facial surface as the face feature vector, then they reduce the representation space by linear transformations [1] [2] [3] [4]. The local methods are interested in the critical face areas where the feature vector is a set of relations between the components of each area [5] [6].…”

Section: Introductionmentioning

confidence: 99%

Face Recognition Using Local Binary Probabilistic Pattern (LBPP) and 2D-DCT Frequency Decomposition

Dahmouni

Aharrane

Moutaouakil

et al. 2016

2016 13th International Conference on Computer Graphics, Imaging and Visualization (CGiV)

Self Cite

View full text Add to dashboard Cite

Facial biometrics is an active modality that uses the face characteristics as argument of person identification. In this paper, we propose a new face recognition system basing on the Local Binary Probabilistic Pattern (LBPP) face representation and the global 2D-DCT frequency methods. The Local Binary Probabilistic Pattern is an alternative of the famous LBP descriptor which uses the confidence interval concept to evaluate the current pixel. Then the LBPP transformed images are decomposed in the frequency domain at 2D-DCT method to build a reduce features vector. The suggested approach is tested on ORL and Yale databases. The obtained results are very encouraging: 95.5% for ORL and 100% for Yale databases recognition rate.

show abstract

A comparison of supervised classification methods for a statistical set of features: Application: Amazigh OCR

Cited by 11 publications

References 24 publications

A new neutrosophic TF-IDF term weighting for text mining tasks: text classification use case

A new neutrosophic TF-IDF term weighting for text mining tasks: text classification use case

The Impact of Neural Embedding Characteristics on Text Mining Tasks: Document Classification Use Case

Face Recognition Using Local Binary Probabilistic Pattern (LBPP) and 2D-DCT Frequency Decomposition

Contact Info

Product

Resources

About