Training with Noise is Equivalent to Tikhonov Regularization

Bishop, Chris

doi:10.1162/neco.1995.7.1.108

Cited by 1,017 publications

(594 citation statements)

References 11 publications

(11 reference statements)

Supporting

Mentioning

581

Contrasting

Unclassified

Order By: Relevance

“…However intuitively, training with "inconsistent" data can be understood as some extra regularization by noise. 24 The results on data set (1) reported in Table 3 vary considerably, with error rates of 13.6% for the Linear Programming Machine, which is our worst result, but is still a relative improvement of 24% over the result reported in ref 14 and of 31% over the result from ref 12 (note, however, that the class priors are different in ref 12). Our best result is 6.8% error from a RBF-SVM of kernel width σ 2 ) 5, a relative improvement of 62% and 66% compared to the errors reported in refs 14 and 12, respectively.…”

Section: Comparison With Prior Resultsmentioning

confidence: 99%

Classifying ‘Drug-likeness' with Kernel-Based Learning Methods

Müller

Rätsch

Sonnenburg

et al. 2005

J. Chem. Inf. Model.

View full text Add to dashboard Cite

In this article we report about a successful application of modern machine learning technology, namely Support Vector Machines, to the problem of assessing the 'drug-likeness' of a chemical from a given set of descriptors of the substance. We were able to drastically improve the recent result by Byvatov et al. (2003) on this task and achieved an error rate of about 7% on unseen compounds using Support Vector Machines. We see a very high potential of such machine learning techniques for a variety of computational chemistry problems that occur in the drug discovery and drug design process.

show abstract

Section: Comparison With Prior Resultsmentioning

confidence: 99%

Classifying ‘Drug-likeness' with Kernel-Based Learning Methods

Müller

Rätsch

Sonnenburg

et al. 2005

J. Chem. Inf. Model.

View full text Add to dashboard Cite

show abstract

“…It creates synthetic entities of the minority class during the model training phase to regularize the prediction models to avoid overfitting and to learn structures representing minority entities. In many ways, SMOTE resembles distortion-based model regularization techniques [34,35]. In this section, we will shortly study the adaptation of the algorithm for LTV prediction.…”

Section: Imbalance In Behavioral Datasets and Synthetic Minority Overmentioning

confidence: 99%

Customer Lifetime Value Prediction in Non-Contractual Freemium Settings: Chasing High-Value Users Using Deep Neural Networks and SMOTE

Sifa¹,

Runge²,

Bauckhage³

et al. 2018

Proceedings of the Annual Hawaii International Conference on System Sciences

View full text Add to dashboard Cite

In non-contractual freemium and sharing economy settings, a small share of users often drives the largest part of revenue for firms and co-finances the free provision of the product or service to a large number of users. Successfully retaining and upselling such high-value users can be crucial to firms' survival. Predictions of customers' Lifetime Value (LTV) are a much used tool to identify high-value users and inform marketing initiatives. This paper frames the related prediction problem and applies a number of common machine learning methods for the prediction of individual-level LTV. As only a small subset of users ever makes a purchase, data are highly imbalanced. The study therefore combines said methods with synthetic minority oversampling (SMOTE) in an attempt to achieve better prediction performance. Results indicate that data augmentation with SMOTE improves prediction performance for premium and high-value users, especially when used in combination with deep neural networks.

show abstract

“…(IV) A publicly calculated linear approximation x ′ Bu − γ is computed by some standard method such as a 1-norm error minimization together with a Tikhonov regularization term ν u 1 [17,2] as follows: min (u,γ,s)…”

Section: Privacy-preserving Linear Kernel Approximationmentioning

confidence: 99%

Privacy-preserving linear and nonlinear approximation via linear programming

Fung¹,

Mangasarian

2013

Optimization Methods and Software

View full text Add to dashboard Cite

We propose a novel privacy-preserving random kernel approximation based on a data matrix A ∈ R m×n whose rows are divided into privately owned blocks. Each block of rows belongs to a different entity that is unwilling to share its rows or make them public. We wish to obtain an accurate function approximation for a given y ∈ R m corresponding to each of the m rows of A. Our approximation of y is a real function on R n evaluated at each row of A and is based on the concept of a reduced kernel K(A, B ′ ) where B ′ is the transpose of a completely random matrix B. The proposed linear-programming-based approximation, which is public but does not reveal the privately-held data matrix A, has accuracy comparable to that of an ordinary kernel approximation based on a publicly disclosed data matrix A.Keywords: privacy-preserving approximation, random kernels, support vector machines, linear programming INTRODUCTIONThe problem addressed in this work is that of obtaining an approximation to a given vector y ∈ R m of function values corresponding to the m rows of a data matrix A ∈ R m×n that represents m points in the n-dimensional real space R n . The matrix A is partitioned into q blocks of rows belonging to q entities that are unwilling to share their data or make them public. The motivation for this work arises from similar problems arising in classification theory where the data, corresponding to rows of a data matrix, is also held by various private entities and hence referred to as horizontally partitioned data. Thus in [19,15] privacy-preserving support vector machine (SVM) classifiers were obtained for such data, while in [20] induction tree classifiers were generated for similar problems. Other privacypreserving classifying techniques include cryptographically private SVMs [7], wavelet-based distortion [10] and rotation perturbation [3]. There is also a substantial body of research on privacy preservation in linear programming such as [1,12,13]. However, there does not appear to be any privacy-preserving applications to approximation problems in the literature. This is the problem we wish to address here as follows.In this work we propose an efficient privacy-preserving approximation (PPA) for horizontally partitioned data that is based on the following two ideas. For a given data matrix A ∈ R m×n , instead of using the usual kernel function K(A, A ′ ) : R m×n × R n×m −→ R m×m for constructing a linear or nonlinear approximation of a given y ∈ R m corresponding to the m rows of A, we use a random kernel [9, 8] K(A, B ′ ) : R m×n × R n×m −→ R m×m ,m < n, where B is a completely random matrix that is publicly disclosed. Such a random kernel will be shown to completely hide the data matrix A. The second idea is that each entity i ∈ {1, . . . , q} makes public only the kernel function K(A i , B ′ ) of its

show abstract

Training with Noise is Equivalent to Tikhonov Regularization

Cited by 1,017 publications

References 11 publications

Classifying ‘Drug-likeness' with Kernel-Based Learning Methods

Classifying ‘Drug-likeness' with Kernel-Based Learning Methods

Customer Lifetime Value Prediction in Non-Contractual Freemium Settings: Chasing High-Value Users Using Deep Neural Networks and SMOTE

Privacy-preserving linear and nonlinear approximation via linear programming

Contact Info

Product

Resources

About