Building text classifiers using positive, unlabeled and ‘outdated’ examples

Han, Jiayu; Zuo, Wangmeng; Liu, Lu; Xu, Yuanbo; Peng, Tao

doi:10.1002/cpe.3879

Cited by 11 publications

(2 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Correspondingly, this paper aims to investigate how to employ textual data features for the detection of concept drift of user opinions, which are prominent in realworld online applications and becomes a major challenge to classification accuracy [19,20]. Concept drift detection approaches are typically used in conjunction with a base classifier, such as the NB and LibSVM (SVM) models to increase classification accuracy [20][21][22][23]. Stream classification models, in general, are designed to train classifiers on both historical and current instances in the stream in order to predict the label sets of incoming instances [7].…”

Section: Related Workmentioning

confidence: 99%

Drift Detection Method Using Distance Measures and Windowing Schemes for Sentiment Classification

Rabiu¹,

Salim²,

Nasser³

et al. 2023

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

Textual data streams have been extensively used in practical applications where consumers of online products have expressed their views regarding online products. Due to changes in data distribution, commonly referred to as concept drift, mining this data stream is a challenging problem for researchers. The majority of the existing drift detection techniques are based on classification errors, which have higher probabilities of false-positive or missed detections. To improve classification accuracy, there is a need to develop more intuitive detection techniques that can identify a great number of drifts in the data streams. This paper presents an adaptive unsupervised learning technique, an ensemble classifier based on drift detection for opinion mining and sentiment classification. To improve classification performance, this approach uses four different dissimilarity measures to determine the degree of concept drifts in the data stream. Whenever a drift is detected, the proposed method builds and adds a new classifier to the ensemble. To add a new classifier, the total number of classifiers in the ensemble is first checked if the limit is exceeded before the classifier with the least weight is removed from the ensemble. To this end, a weighting mechanism is used to calculate the weight of each classifier, which decides the contribution of each classifier in the final classification results. Several experiments were conducted on real-world datasets and the results were evaluated on the false positive rate, miss detection rate, and accuracy measures. The proposed method is also compared with the state-of-the-art methods, which include DDM, EDDM, and PageHinkley with support vector machine (SVM) and Naïve Bayes classifiers that are frequently used in concept drift detection studies. In all cases, the results show the efficiency of our proposed method.

show abstract

Section: Related Workmentioning

confidence: 99%

Drift Detection Method Using Distance Measures and Windowing Schemes for Sentiment Classification

Rabiu¹,

Salim²,

Nasser³

et al. 2023

Computers, Materials &Amp; Continua

View full text Add to dashboard Cite

show abstract

“…Such problems require a special case of semi supervised learning called "learning from positive and unlabeled data" or "PU learning" for short. PU learning only requires positive examples and it normally has two steps, (1) to identify reliable negative examples from the unlabeled dataset, and (2) employ a classifier for classification purposes (Liu et al, 2003;Chen, 2009;Han et al, 2016Han et al, , 2018. Various PU learning techniques have been developed such as spy EM (S-EM) (Liu et al, 2002), positive examplebased learning (PEBL) (Yu et al, 2002), one class support vector machine (Manevitz and Yousef, 2002), Roc-SVM , weighted logistic regression (Lee and Liu, 2003), biased SVM (Liu et al, 2003), and bagging SVM (Mordelet and Vert, 2014) to name a few.…”

Section: Introductionmentioning

confidence: 99%

Modeling Conditions Appropriate for Wildfire in South East China – A Machine Learning Approach

2021

View full text Add to dashboard Cite

Wildfire is one of the most common natural hazards in the world. Fire risk estimation for the purposes of risk reduction is an important aspect in disaster studies around the world. The aim of this research was to develop a machine learning workflow process for South East China to monitor fire risks over a large region by learning from a grid file database containing a time series of several of the important environmental parameters largely extracted from remote sensing data products, and highlight areas as fire risk or non-fire risk over a couple of weeks in the future. The study employed fire threshold and the transductive PU learning method to identify reliable non-fire/negative training samples from the grid file database using fire/positive training samples, labeled using the MODIS MCD14ML fire location product. Different models were trained for the three natural vegetation land covers, namely evergreen broadleaf forest, mixed forest, and woody savannas in the study area. On the test dataset, the three models exhibited high sensitivity (>80%) by identifying the majority of fires in the test dataset for all land covers. The use of the reliable negatives identified though the fire threshold and PU learning process resulted in low precision and accuracy. During the model verification process, the model for the mixed forest land cover performed the best with 70% of verification fires falling within the classified fire zone. It was found that the better representation of mixed forest in the training samples made this model perform more reliably as compared to others. Improving the individual models constructed for different land covers and combining them can provide fire classification for a larger region. There is room to improve the spatial precision of fire cell classification. Introducing finer scale features that have higher correlation with fire activity and exhibit high spatial variability seems a viable way forward.

show abstract