Discriminative learning for differing training and test distributions

Bickel, Steffen; Brückner, Michael; Scheffer, Tobias

doi:10.1145/1273496.1273507

Cited by 329 publications

(241 citation statements)

References 10 publications

Supporting

Mentioning

240

Contrasting

Unclassified

Order By: Relevance

“…As another approach, one can use logistic regression for the inference of density ratios, since the ratio of two probability densities is directly connected to the posterior probability of labels in classification problems. Using the Bayes formula, the estimated posterior probability can be transformed to an estimator of density ratios (Bickel et al 2007). …”

Section: Introductionmentioning

confidence: 99%

Statistical analysis of kernel-based least-squares density-ratio estimation

2011

View full text Add to dashboard Cite

The ratio of two probability densities can be used for solving various machine learning tasks such as covariate shift adaptation (importance sampling), outlier detection (likelihood-ratio test), feature selection (mutual information), and conditional probability estimation. Several methods of directly estimating the density ratio have recently been developed, e.g., moment matching estimation, maximum-likelihood density-ratio estimation, and least-squares density-ratio fitting. In this paper, we propose a kernelized variant of the least-squares method for density-ratio estimation, which is called kernel unconstrained leastsquares importance fitting (KuLSIF). We investigate its fundamental statistical properties including a non-parametric convergence rate, an analytic-form solution, and a leave-oneout cross-validation score. We further study its relation to other kernel-based density-ratio estimators. In experiments, we numerically compare various kernel-based density-ratio estimation methods, and show that KuLSIF compares favorably with other approaches.

show abstract

Section: Introductionmentioning

confidence: 99%

Statistical analysis of kernel-based least-squares density-ratio estimation

2011

View full text Add to dashboard Cite

show abstract

“…These estimates are compared with our best estimate of the true values -the gold standard computed over the entire corpus -in Table 3.2. Table 3.3 shows the result of splitting the values of head:enron into three discrete ranges: [0-9], [10][11][12][13][14][15][16][17][18][19], [20][21][22][23][24][25][26][27][28][29][30], and the effect of two choices of α and β. Values in the center range clearly predict spam, while extreme values predict non-spam.…”

Section: An Examplementioning

confidence: 99%

Email Spam Filtering: A Systematic Review

Cormack

2008

FNT in Information Retrieval

226

141

View full text Add to dashboard Cite

Spam is information crafted to be delivered to a large number of recipients, in spite of their wishes. A spam filter is an automated tool to recognize spam so as to prevent its delivery. The purposes of spam and spam filters are diametrically opposed: spam is effective if it evades filters, while a filter is effective if it recognizes spam. The circular nature of these definitions, along with their appeal to the intent of sender and recipient make them difficult to formalize. A typical email user has a working definition no more formal than "I know it when I see it." Yet, current spam filters are remarkably effective, more effective than might be expected given the level of uncertainty and debate over a formal definition of spam, more effective than might be expected given the state-of-the-art information retrieval and machine learning methods for seemingly similar problems. But are they effective enough? Which are better? How might they be improved? Will their effectiveness be compromised by more cleverly crafted spam?We survey current and proposed spam filtering techniques with particular emphasis on how well they work. Our primary focus is spam filtering in email; Similarities and differences with spam filtering in other communication and storage media -such as instant messaging and the Web -are addressed peripherally. In doing so we examine the definition of spam, the user's information requirements and the role of the spam filter as one component of a large and complex information universe. Well-known methods are detailed sufficiently to make the exposition self-contained, however, the focus is on considerations unique to spam. Comparisons, wherever possible, use common evaluation measures, and control for differences in experimental setup. Such comparisons are not easy, as benchmarks, measures, and methods for evaluating spam filters are still evolving. We survey these efforts, their results and their limitations. In spite of recent advances in evaluation methodology, many uncertainties (including widely held but unsubstantiated beliefs) remain as to the effectiveness of spam filtering techniques and as to the validity of spam filter evaluation methods. We outline several uncertainties and propose experimental methods to address them.

show abstract

“…The positive class expansion problem appears to have some relationship with PULearning [12,17], concept drift [9,10], and covariate shift [8,1]. But in fact it is very different from these tasks.…”

Section: Related Workmentioning

confidence: 99%

“…Approaches (e.g. [8,1]) addressing this problem try to correct the bias in the training instances, such that minimizing error on the training instances corresponds to minimizing error on the test instances.…”

Section: Related Workmentioning

confidence: 99%

A framework for modeling positive class expansion with single snapshot

Zhou

2009

Knowl Inf Syst

View full text Add to dashboard Cite

Abstract. In many real-world data mining tasks, the coverage of the target concept may change as the time changes. For example,the coverage of "learned knowledge" of a student today may be different from his/er "learned knowledge" tomorrow, since the "learned knowledge" of the student is in expanding everyday. In order to learn a model capable of making accurate predictions, the evolution of the concept must be considered, and thus, a series of data sets collected at different time is needed. However, in many cases there is only a single data set instead of a series of data sets. In other words, only a single snapshot of the data along the time axis is available. In this paper, we show that for positive class expansion, i.e., the coverage of the target concept is in expanding as illustrated in the above "learned knowledge" example, we can learn an accurate model from the single snapshot data with the help of domain knowledge given by user. The effectiveness of the proposed framework is validated in experiments.

show abstract

Discriminative learning for differing training and test distributions

Cited by 329 publications

References 10 publications

Statistical analysis of kernel-based least-squares density-ratio estimation

Statistical analysis of kernel-based least-squares density-ratio estimation

Email Spam Filtering: A Systematic Review

A framework for modeling positive class expansion with single snapshot

Contact Info

Product

Resources

About