2019
DOI: 10.1002/cpe.5604
|View full text |Cite
|
Sign up to set email alerts
|

An improved term weighting scheme for text classification

Abstract: Summary Text representation is a necessary and primary procedure in performing text classification (TC), which first needs to be obtained through an information‐rich term weighting scheme to achieve higher TC performance. So far, term frequency‐inverse document frequency (TF‐IDF) is the most widely used term weighting scheme, but it suffers from two deficiencies. First, the global weighting factors IDF in TF‐IDF approaches infinity if a certain term does not occur in a text. Second, the IDF is equal to zero if… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 19 publications
(16 citation statements)
references
References 65 publications
0
16
0
Order By: Relevance
“…Additionally, feature selection is the recommended approach by practitioners and researchers [48]. In contrast, previous work also indicates that the performance tends to increase if more features are used [47]. It would be interesting to investigate if feature selection can improve the accuracy for small datasets using our experimental approach [37].…”
Section: Discussionmentioning
confidence: 99%
“…Additionally, feature selection is the recommended approach by practitioners and researchers [48]. In contrast, previous work also indicates that the performance tends to increase if more features are used [47]. It would be interesting to investigate if feature selection can improve the accuracy for small datasets using our experimental approach [37].…”
Section: Discussionmentioning
confidence: 99%
“…In other aspects, Zhong Tang et al described two deficiencies from which TF-IDF suffers, namely, collection frequency factor being undefined (division by zero) or being equal to zero in some special cases. ey proposed a novel method, namely, term frequency-inverse exponential frequency (TF-IEF), to overcome these drawbacks [14]. e proposed methods replaced the IDF with a global weighting factor IEF, and a log-like method is used to characterize the collection frequency factor.…”
Section: Background Studymentioning
confidence: 99%
“…"Good" term weighting methods are of fundamental importance for guaranteeing good TC performance. So far, there are two main categories of TWSs in the literature: semantic-based TWSs and statistics-based TWSs [14].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Text representation is a necessary and primary procedure in performing ATC and OM systems. It first needs to be obtained through an information‐rich term weighting scheme to achieve higher performance 12 . Most known techniques for text representation form vectors that contain many zeros as most terms appear in a small number of texts.…”
Section: Introductionmentioning
confidence: 99%