2022
DOI: 10.1002/cpe.6909
|View full text |Cite
|
Sign up to set email alerts
|

A new metric for feature selection on short text datasets

Abstract: In recent years, short texts are everywhere, especially in social media networks. Short text classification is an essential task for various applications related to the operations on short text documents. In many cases, using the entire feature set causes the high dimensionality problem in short text data. This problem reason of time‐consuming and negatively impacts the performance of classifiers. This study presents an effective feature selection algorithm called XY method, which represents the features on XY… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 29 publications
0
6
0
Order By: Relevance
“…The chi‐square test is believed to yield better results when the input variables are categorical or numerical and the output variable is categorical. A chi‐square test 54 is a common and widely used statistical test that reveals whether two variables have a statistically significant relationship ( p < 0.0001). We used Chi‐square scores and p values to identify important variables that have a significant impact on the dependent variable (injury severity).…”
Section: Methodsmentioning
confidence: 99%
“…The chi‐square test is believed to yield better results when the input variables are categorical or numerical and the output variable is categorical. A chi‐square test 54 is a common and widely used statistical test that reveals whether two variables have a statistically significant relationship ( p < 0.0001). We used Chi‐square scores and p values to identify important variables that have a significant impact on the dependent variable (injury severity).…”
Section: Methodsmentioning
confidence: 99%
“…For example, there are traditional methods such as Information Gain (IG), Gain Ratio (GR), Gini Index (GI), Chi2, Mutual Information (MI) (Sharmin et al, 2019) as well as recently proposed approaches such as DFS, NDM, MMR and MRDC. Many of these methods are widely used in applications such as text classification (Cekik & Uysal, 2022). The IG approach is commonly used, particularly in data and text mining.…”
Section: Related Workmentioning
confidence: 99%
“…2) Comparison with Chi 2 Feature Selection Chi 2 [79], [80] statistical test has been used in text feature selection based on statistical significance of features. We selected the top "1355 features" using the Chi 2 method to compare the results with our best findings.…”
Section: Expidmentioning
confidence: 99%