2017
DOI: 10.1186/s13638-017-0950-z
|View full text |Cite
|
Sign up to set email alerts
|

A feature selection method based on synonym merging in text classification system

Abstract: As an important step in natural language processing (NLP), text classification system has been widely used in many fields, like spam filtering, news classification, and web page detection. Vector space model (VSM) is generally used to extract feature vectors for representing texts which is very important for text classification. In this paper, a feature selection algorithm based on synonym merging named SM-CHI is proposed. Besides, the improved CHI formula and synonym merging are used to select feature words s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
8
1
1

Relationship

0
10

Authors

Journals

citations
Cited by 12 publications
(5 citation statements)
references
References 13 publications
0
5
0
Order By: Relevance
“…Feature selection is a very important step, as the selected feature words directly affect the accuracy of the classifier (Yao et al, 2017). To check if there some common features between two categorical variables or not, this test can be useful in merging the synonyms among the feature words so that the dimension of feature space can be reduced.…”
Section: Statistical and Linguistic Methodsmentioning
confidence: 99%
“…Feature selection is a very important step, as the selected feature words directly affect the accuracy of the classifier (Yao et al, 2017). To check if there some common features between two categorical variables or not, this test can be useful in merging the synonyms among the feature words so that the dimension of feature space can be reduced.…”
Section: Statistical and Linguistic Methodsmentioning
confidence: 99%
“…In the following, we presented a comparative study of MLA. The MLA find their applications in several areas, namely: text classification [13]- [17], medical diagnosis [18], pollution prediction [19], spam email detection [20], plant disease identification [21], and stock daily trading [22]. For example, The paper [13] describes the use of the KNN algorithm with the TF-IDF method for text classification.…”
Section: Related Workmentioning
confidence: 99%
“…The accuracy of the model will signi cantly increase due to reduction in features as well as increasing the weight of the selected feature. A study con rmed that merging words with synonym reduces feature dimension and improves the accuracy and effectiveness of text classi cation model (Yao et al 2017). For this purpose, two different functions in python have been created that converts plural forms to singular and substitute the required word with its synonym.…”
Section: Plural To Singular Conversion and Synonym Substitutionmentioning
confidence: 99%