2019
DOI: 10.3390/app9081578
|View full text |Cite
|
Sign up to set email alerts
|

Method of Feature Reduction in Short Text Classification Based on Feature Clustering

Abstract: One decisive problem of short text classification is the serious dimensional disaster when utilizing a statistics-based approach to construct vector spaces. Here, a feature reduction method is proposed that is based on two-stage feature clustering (TSFC), which is applied to short text classification. Features are semi-loosely clustered by combining spectral clustering with a graph traversal algorithm. Next, intra-cluster feature screening rules are designed to remove outlier feature words, which improves the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 30 publications
0
3
0
Order By: Relevance
“…The TF-IDF model is a prevalent research method in natural language processing (NLP) used in the implementation of the algorithm described in this article [20].…”
Section: ) Text Vectorizationmentioning
confidence: 99%
“…The TF-IDF model is a prevalent research method in natural language processing (NLP) used in the implementation of the algorithm described in this article [20].…”
Section: ) Text Vectorizationmentioning
confidence: 99%
“…This rate of increase of data production is also seen across many other fields such as graph data generated by social networks or finance data generated by the stock markets [5]. There is therefore a need for more reliable, cost-effective, and faster data mining techniques and machine learning models which use the data [6][7][8][9]. Useful features can be extracted from the data to reduce dimensionality by removing redundant or noisy features.…”
Section: Introductionmentioning
confidence: 97%
“…Among them, short text classification has attracted more attention and encountered more challenges due to their limited length. Most of the internet data are short texts, such as microblogs and bulletin board system (BBS) [5]. Accurate short text classification plays an important role both for the enterprises and individuals, even for the government services.…”
Section: Introductionmentioning
confidence: 99%