PU text classification enhanced by term frequency–inverse document frequency‐improved weighting

Peng, Tao; Liu, Lu; Zuo, Wangmeng

doi:10.1002/cpe.3040

Cited by 30 publications

(20 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The similarity between documents is determined by comparing the relations between vectors. Among them, the most widely used weight calculation method is TF-IDF algorithm [39] and various improved algorithms. The most commonly used similarity measurement method is cosine similarity measurement [40].…”

Section: Related Workmentioning

confidence: 99%

A Probabilistic Privacy Preserving Strategy for Word‐of‐Mouth Social Networks

Jing

Chen

Wen

2018

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

An online social network (OSN) is a platform that makes people communicate with friends, share messages, accelerate business, and enhance teamwork. In the OSN, privacy issues are increasingly concerned, especially in private message leaks in word-of-mouth. A user’s privacy may be leaked out by acquaintances without user’s consent. In this paper, an integrated system is designed to prevent this illegal privacy leak. In particular, we only use the method of space vector model to determine whether the user’s private message is really leaked. Canary traps techniques are used to detect leakers. Then, we define a trust degree mechanism to evaluate trustworthiness of a communicator dynamically. Finally, we set up a new message publishing system to determine who can obtain the message of publisher. Secrecy performance analysis is provided to verify the effectiveness of the proposed message publishing system. Accordingly, a user in social networks can check whether other users are trustworthy before sending their private messages.

show abstract

Section: Related Workmentioning

confidence: 99%

A Probabilistic Privacy Preserving Strategy for Word‐of‐Mouth Social Networks

Jing

Chen

Wen

2018

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

show abstract

“…In order to reflect the different importance of the feature in the set P and the set RN, we adopt an improved term frequency-inverse document frequency method [22], term frequency inverse positive-negative document frequency (TFIPNDF), that is, We first use vector space model to represent the documents in the training and the testing set, and we need to weight the features in the vector.…”

Section: Building the Classifiers By Applying Support Vector Machine mentioning

confidence: 99%

“…A feature often plays a different role in the set P and the set RN, respectively. In order to reflect the different importance of the feature in the set P and the set RN, we adopt an improved term frequency-inverse document frequency method [22], term frequency inverse positive-negative document frequency (TFIPNDF), that is,…”

Section: Building the Classifiers By Applying Support Vector Machine mentioning

confidence: 99%

“…A feature often plays a different role in the set P and the set RN , respectively. In order to reflect the different importance of the feature in the set P and the set RN , we adopt an improved term frequency‐inverse document frequency method , term frequency inverse positive–negative document frequency (TFIPNDF), that is,

italicTFIPNDF = ({, \begin{array}{l} f_{i k} \times \frac{P_{i}}{S_{P}} \times normallog ((), \frac{N}{n_{i}}), italicdocument k \in italicpositive examples \\ f_{i k} \times \frac{R N_{i}}{S_{R N}} \times normallog ((), \frac{N}{n_{i}}), italicdocument k \in italicnegative examples \end{array})

where f ik is the number of the feature i occurs in the document k , P i is the number of the feature i occurs in the positive set P , RN i is the number of the feature i occurs in the negative set RN , S p is the size of P , S RN is the size of RN , N is the size of the entire training set and n i is the number of the feature i occurs in the entire training set.…”

Section: Building the Classifiers By Applying Support Vector Machine mentioning

confidence: 99%

See 1 more Smart Citation

Building text classifiers using positive, unlabeled and ‘outdated’ examples

Han

Zuo

Liu

et al. 2016

Concurrency and Computation

Self Cite

View full text Add to dashboard Cite

Learning from positive and unlabeled examples (PU learning) is a partially supervised classification that is frequently used in Web and text retrieval system. The merit of PU learning is that it can get good performance with less manual work. Motivated by transfer learning, this paper presents a novel method that transfers the 'outdated data' into the process of PU learning. We first propose a way to measure the strength of the features and select the strong features and the weak features according to the strength of the features. Then, we extract the reliable negative examples and the candidate negative examples using the strong and the weak features (Transfer-1DNF). Finally, we construct a classifier called weighted voting iterative support vector machine (SVM) that is made up of several subclassifiers by applying SVM iteratively, and each subclassifier is assigned a weight in each iteration. We conduct the experiments on two datasets: 20 Newsgroups and Reuters-21578, and compare our method with three baseline algorithms: positive example-based learning, weighted voting classifier and SVM. The results show that our proposed method Transfer-1DNF can extract more reliable negative examples with lower error rates, and our classifier outperforms the baseline algorithms.

show abstract

“…However, it does not consider about that of inter-class information. Literature [8] adopted solution by square the Inverse Word Frequency, (IWF) to reduce the dependency of IDF on term frequency. While for the micro-blog, the methods above never consider about the time factor, and as a result, ideal topic clustering effect has not been achieved.…”

Section: Related Workmentioning

confidence: 99%