Text classification based on multi-word with support vector machine

Zhang, Wen; Yoshida, Taketoshi; Tang, Xijin

doi:10.1016/j.knosys.2008.03.044

Cited by 235 publications

(109 citation statements)

References 27 publications

(30 reference statements)

Supporting

Mentioning

102

Contrasting

Unclassified

Order By: Relevance

“…In the future, on the one hand, we will use more data sets to examine the effectiveness of the proposed CoFea algorithm in spam review identification. On the other hand, we will also extend the co-training algorithm to more research areas such as sentiment analysis [26], image recognition [27], and text classification [28] to explore more fields. In fact, text classification is a basic technique for deceptive review identification.…”

Section: Discussionmentioning

confidence: 99%

CoFea: A Novel Approach to Spam Review Identification Based on Entropy and Co-Training

Zhang

Yoshida

et al. 2016

Entropy

Self Cite

View full text Add to dashboard Cite

Abstract:With the rapid development of electronic commerce, spam reviews are rapidly growing on the Internet to manipulate online customers' opinions on goods being sold. This paper proposes a novel approach, called CoFea (Co-training by Features), to identify spam reviews, based on entropy and the co-training algorithm. After sorting all lexical terms of reviews by entropy, we produce two views on the reviews by dividing the lexical terms into two subsets. One subset contains odd-numbered terms and the other contains even-numbered terms. Using SVM (support vector machine) as the base classifier, we further propose two strategies, CoFea-T and CoFea-S, embedded with the CoFea approach. The CoFea-T strategy uses all terms in the subsets for spam review identification by SVM. The CoFea-S strategy uses a predefined number of terms with small entropy for spam review identification by SVM. The experiment results show that the CoFea-T strategy produces better accuracy than the CoFea-S strategy, while the CoFea-S strategy saves more computing time than the CoFea-T strategy with acceptable accuracy in spam review identification.

show abstract

Section: Discussionmentioning

confidence: 99%

CoFea: A Novel Approach to Spam Review Identification Based on Entropy and Co-Training

Zhang

Yoshida

et al. 2016

Entropy

Self Cite

View full text Add to dashboard Cite

show abstract

“…Diab has used multi-word features in the Arabic document classification and two similarity functions [15]: the cosine and the dice similarity functions. He also applied inverse document frequency (IDF) to prevent frequent terms from dominating the value of the function and he used different light stemmers on multi-word features.…”

Section: Related Workmentioning

confidence: 99%

“…Zhang et al have used a multi-word technique for features representation with support vector machine as classifier to improve document classification [19]. Two strategies were developed for feature representation based on the different semantic level of the multi-words.…”

Section: Related Workmentioning

confidence: 99%

Arabic Text Categorization Using Mixed Words

Hussein¹,

Mousa²,

Sallam³

2016

IJITCS

View full text Add to dashboard Cite

Abstract-There is a tremendous number of Arabic text documents available online that is growing every day. Thus, categorizing these documents becomes very important. In this paper, an approach is proposed to enhance the accuracy of the Arabic text categorization. It is based on a new features representation technique that uses a mixture of a bag of words (BOW) and two adjacent words with different proportions. It also introduces a new features selection technique depends on Term Frequency (TF) and uses Frequency Ratio Accumulation Method (FRAM) as a classifier. Experiments are performed without both of normalization and stemming, with one of them, and with both of them. In addition, three data sets of different categories have been collected from online Arabic documents for evaluating the proposed approach. The highest accuracy obtained is 98.61% by the use of normalization.

show abstract

“…SVMs have been popular in text classification and categorization [35,36]. SVM is designed for two-class pattern classification.…”

Section: Research In Text Mining and Document Classificationmentioning

confidence: 99%

Vehicle Fault Diagnostics Using Text Mining, Vehicle Engineering Structure and Machine Learning

Murphey

Huang²,

Wang³

et al. 2015

IJIIS

View full text Add to dashboard Cite

This paper presents an intelligent vehicle fault diagnostics system, SeaProSel(Search-Prompt-Select). SeaProSel takes a casual description of vehicle problems as input and searches for a diagnostic code that accurately matches the problem description. SeaProSel was developed using automatic text classification and machine learning techniques combined with a prompt-and-select technique based on the vehicle diagnostic engineering structure to provide robust classification of the diagnostic code that accurately matches the problem description. Machine learning algorithms are developed to automatically learn words and terms, and their variations commonly used in verbal descriptions of vehicle problems, and to build a TCW(Term-Code-Weight) matrix that is used for measuring similarity between a document vector and a diagnostic code class vector. When no exactly matched diagnostic code is found based on the direct search using the TCW matrix, the SeaProSel system will search the vehicle fault diagnostic structure for the proper questions to pose to the user in order to obtain more details about the problem. A LSI (Latent Semantic Indexing) model is also presented and analyzed in the paper. The performances of the LSI model and TCW models are presented and discussed. An in-depth study of different term weight functions and their performances are presented. All experiments are conducted on real-world vehicle diagnostic data, and the results show that the proposed SeaProSel system generates accurate results efficiently for vehicle fault diagnostics.

show abstract

Text classification based on multi-word with support vector machine

Cited by 235 publications

References 27 publications

CoFea: A Novel Approach to Spam Review Identification Based on Entropy and Co-Training

CoFea: A Novel Approach to Spam Review Identification Based on Entropy and Co-Training

Arabic Text Categorization Using Mixed Words

Vehicle Fault Diagnostics Using Text Mining, Vehicle Engineering Structure and Machine Learning

Contact Info

Product

Resources

About