Rademacher Processes and Bounding the Risk of Function Learning

Koltchinskii, Vladimir; Panchenko, Dmitry

doi:10.1007/978-1-4612-1358-1_29

Cited by 195 publications

(240 citation statements)

References 12 publications

Supporting

Mentioning

230

Contrasting

Unclassified

Order By: Relevance

“…Following the ideas initially introduced by Koltchinskii and Panchenko (1999), Bartlett et al (2005) and Bartlett et al (2004) propose some localized versions of Rademacher averages as tight data-dependent measures of complexity. Recently, it has been proved that these localized Rademacher averages can be used to construct margin-adaptive model selection procedures (see Boucheron et al, 2005, for a brief survey, or Koltchinskii, 2003, for a more complete study).…”

Section: Resultsmentioning

confidence: 99%

Model selection by bootstrap penalization for classification

Fromont

2006

Mach Learn

View full text Add to dashboard Cite

We consider the binary classification problem. Given an i.i.d. sample drawn from the distribution of an X × {0, 1}−valued random pair, we propose to estimate the so-called Bayes classifier by minimizing the sum of the empirical classification error and a penalty term based on Efron's or i.i.d. weighted bootstrap samples of the data. We obtain exponential inequalities for such bootstrap type penalties, which allow us to derive non-asymptotic properties for the corresponding estimators. In particular, we prove that these estimators achieve the global minimax risk over sets of functions built from Vapnik-Chervonenkis classes. The obtained results generalize Koltchinskii (2001) and Bartlett et al.'s (2002) ones for Rademacher penalties that can thus be seen as special examples of bootstrap type penalties. To illustrate this, we carry out an experimental study in which we compare the different methods for an intervals model selection problem.

show abstract

Section: Resultsmentioning

confidence: 99%

Model selection by bootstrap penalization for classification

Fromont

2006

Mach Learn

View full text Add to dashboard Cite

show abstract

“…In particular, the method in [16] requires a polynomial decay of the regularization error D(λ) = O(λ β ) with some 0 < β ≤ 1. Similar ideas of norm reduction also appear in [13] for the purpose of bounding the risk of function learning.…”

Section: M δ) §5 Strong Estimates By Iterationmentioning

confidence: 99%

Learning Rates of Least-Square Regularized Regression

2005

View full text Add to dashboard Cite

This paper considers the regularized learning algorithm associated with the leastsquare loss and reproducing kernel Hilbert spaces. The target is the error analysis for the regression problem in learning theory. A novel regularization approach is presented, which yields satisfactory learning rates. The rates depend on the approximation property and the capacity of the reproducing kernel Hilbert space measured by covering numbers. When the kernel is C ∞ and the regression function lies in the corresponding reproducing kernel Hilbert space, the rate is m −ζ with ζ arbitrarily close to 1, regardless of the variance of the bounded probability distribution.Short Title: Least-square Regularized Regression

show abstract

“…Subsequently, several researchers have proposed related disagreement-based algorithms with improved sample complexity, e.g. [8,11,5].…”

Section: Disagreement-based Active Learningmentioning

confidence: 99%

Active Learning – Modern Learning Theory

Balcan

Urner

2016

Encyclopedia of Algorithms

View full text Add to dashboard Cite

Most classic machine learning methods depend on the assumption that humans can annotate all the data available for training. However, many modern machine learning applications (including image and video classification, protein sequence classification, and speech processing) have massive amounts of unannotated or unlabeled data. As a consequence, there has been tremendous interest both in machine learning and its application areas in designing algorithms that most efficiently utilize the available data while minimizing the need for human intervention. An extensively used and studied technique is active learning, where the algorithm is presented with a large pool of unlabeled examples (such as all images available on the web) and can interactively ask for the labels of examples of its own choosing from the pool, with the goal to drastically reduce labeling effort. Formal setupWe consider classification problems (such as classifying images by who is in them or classifying emails as spam or not), where the goal is to predict a label y based on its corresponding input vector x. In the standard machine learning formulation, we assume that the data points (x, y) are drawn from an unknown underlying distribution D XY over X × Y ; X is called the feature (instance) space and Y = {0, 1} is the label space. The goal is to output a hypothesis function h of small error (or small 0/1 loss), where err(h) = P (x,y)∼D XY [h(x) = y]. In the passive learning setting, the learning algorithm is given a set of labeled examples (x 1 , y 1 ), . . . , (x m , y m ) drawn 1

show abstract

Rademacher Processes and Bounding the Risk of Function Learning

Cited by 195 publications

References 12 publications

Model selection by bootstrap penalization for classification

Model selection by bootstrap penalization for classification

Learning Rates of Least-Square Regularized Regression

Active Learning – Modern Learning Theory

Contact Info

Product

Resources

About