2013
DOI: 10.1002/cpe.3040
|View full text |Cite
|
Sign up to set email alerts
|

PU text classification enhanced by term frequency–inverse document frequency‐improved weighting

Abstract: SUMMARYTerm frequency-inverse document frequency (TF-IDF), one of the most popular feature (also called term or word) weighting methods used to describe documents in the vector space model and the applications related to text mining and information retrieval, can effectively reflect the importance of the term in the collection of documents, in which all documents play the same roles. But, TF-IDF does not take into account the difference of term IDF weighting if the documents play different roles in the collect… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
17
0
3

Year Published

2016
2016
2021
2021

Publication Types

Select...
7

Relationship

1
6

Authors

Journals

citations
Cited by 30 publications
(20 citation statements)
references
References 22 publications
0
17
0
3
Order By: Relevance
“…The similarity between documents is determined by comparing the relations between vectors. Among them, the most widely used weight calculation method is TF-IDF algorithm [39] and various improved algorithms. The most commonly used similarity measurement method is cosine similarity measurement [40].…”
Section: Related Workmentioning
confidence: 99%
“…The similarity between documents is determined by comparing the relations between vectors. Among them, the most widely used weight calculation method is TF-IDF algorithm [39] and various improved algorithms. The most commonly used similarity measurement method is cosine similarity measurement [40].…”
Section: Related Workmentioning
confidence: 99%
“…In order to reflect the different importance of the feature in the set P and the set RN, we adopt an improved term frequency-inverse document frequency method [22], term frequency inverse positive-negative document frequency (TFIPNDF), that is, We first use vector space model to represent the documents in the training and the testing set, and we need to weight the features in the vector.…”
Section: Building the Classifiers By Applying Support Vector Machine mentioning
confidence: 99%
“…A feature often plays a different role in the set P and the set RN, respectively. In order to reflect the different importance of the feature in the set P and the set RN, we adopt an improved term frequency-inverse document frequency method [22], term frequency inverse positive-negative document frequency (TFIPNDF), that is,…”
Section: Building the Classifiers By Applying Support Vector Machine mentioning
confidence: 99%
See 1 more Smart Citation
“…However, it does not consider about that of inter-class information. Literature [8] adopted solution by square the Inverse Word Frequency, (IWF) to reduce the dependency of IDF on term frequency. While for the micro-blog, the methods above never consider about the time factor, and as a result, ideal topic clustering effect has not been achieved.…”
Section: Related Workmentioning
confidence: 99%