Generalization Error Bounds Using Unlabeled Data

Kääriäinen, Matti

doi:10.1007/11503415_9

Cited by 34 publications

(33 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…For instance, in the PAC model, it is perfectly natural (and common) to talk about the problem of learning a concept class such as DNF formulas [Linial et al 1989;Verbeurgt 1990] or an intersection of halfspaces [Baum 1990;Blum and Kannan 1997;Vempala 1997;Klivans et al 2002] over the uniform distribution; but clearly in this case unlabeled data is useless -you can just generate it yourself. For learning over an unknown distribution, unlabeled data can help somewhat in the standard models (e.g., by allowing one to use distribution-specific algorithms and sample-complexity bounds [Benedek and Itai 1991;Kaariainen 2005]), but this does not seem to capture the power of unlabeled data in practical semi-supervised learning methods.…”

Section: ·mentioning

confidence: 99%

“…It is well known that when learning under an unknown distribution, unlabeled data might help somewhat even in the standard discriminative models by allowing one to use both distribution-specific algorithms [Benedek and Itai 1991], [Kaariainen 2005], [Sokolovska et al 2008] and/or tighter data dependent sample-complexity bounds [Bartlett and Mendelson 2002;Koltchinskii 2001]. However in all these methods one chooses a class of functions or a prior over functions before performing the inference.…”

Section: Relationship To Other Ways Of Using Unlabeled Data For Learningmentioning

confidence: 99%

See 1 more Smart Citation

A discriminative model for semi-supervised learning

Balcan

Blum

2010

J. ACM

100

View full text Add to dashboard Cite

Supervised learning -that is, learning from labeled examples -is an area of Machine Learning that has reached substantial maturity. It has generated general-purpose and practically-successful algorithms and the foundations are quite well understood and captured by theoretical frameworks such as the PAC-learning model and the Statistical Learning theory framework. However, for many contemporary practical problems such as classifying web pages or detecting spam, there is often additional information available in the form of unlabeled data, which is often much cheaper and more plentiful than labeled data. As a consequence, there has recently been substantial interest in semi-supervised learning -using unlabeled data together with labeled data -since any useful information that reduces the amount of labeled data needed can be a significant benefit. Several techniques have been developed for doing this, along with experimental results on a variety of different learning problems. Unfortunately, the standard learning frameworks for reasoning about supervised learning do not capture the key aspects and the assumptions underlying these semisupervised learning methods.In this paper we describe an augmented version of the PAC model designed for semi-supervised learning, that can be used to reason about many of the different approaches taken over the past decade in the Machine Learning community. This model provides a unified framework for analyzing when and why unlabeled data can help, in which one can analyze both sample-complexity and algorithmic issues. The model can be viewed as an extension of the standard PAC model where, in addition to a concept class C, one also proposes a compatibility notion: a type of compatibility that one believes the target concept should have with the underlying distribution of data. Unlabeled data is then potentially helpful in this setting because it allows one to estimate compatibility over the space of hypotheses, and to reduce the size of the search space from the whole set of hypotheses C down to those that, according to one's assumptions, are a-priori reasonable with respect to the distribution. As we show, many of the assumptions underlying existing semi-supervised learning algorithms can be formulated in this framework.After proposing the model, we then analyze sample-complexity issues in this setting: that is, how much of each type of data one should expect to need in order to learn well, and what the key quantities are that these numbers depend on. We also consider the algorithmic question of how to efficiently optimize for natural classes and compatibility notions, and provide several algorithmic results including an improved bound for Co-Training with linear separators when the distribution satisfies independence given the label.

show abstract

Section: ·mentioning

confidence: 99%

Section: Relationship To Other Ways Of Using Unlabeled Data For Learningmentioning

confidence: 99%

A discriminative model for semi-supervised learning

Balcan

Blum

2010

J. ACM

100

View full text Add to dashboard Cite

show abstract

“…In a different framework, that of Valiant's PAC learning, there are concentration statements about the risks in the presence of unlabeled examples (Balcan and Blum 2005;Kääriäinen 2005), though in these results, the unlabeled points are used in a very different way than in our work. Specifically, in the work of Balcan and Blum (2005), the authors introduce the notion of incompatibility E x∼D [1 − χ(h, x)] between a function h and the input distribution D. The unlabeled examples are used to estimate the distribution dependent quantity E x∼D [1 − χ(h, x)].…”

Section: Related Workmentioning

confidence: 82%

“…In the work of Kääriäinen (2005), the author obtains a generalization bound by approximating the disagreement probability of pairs of classifiers using unlabeled data. Again, here the unlabeled data is used to estimate a distribution dependent quantity, namely, the true disagreement probability between consistent models.…”

Section: Theorem 8 (Theorem 1 Of Balcan and Blum 2005) If We See M Unmentioning

confidence: 99%

Generalization bounds for learning with linear, polygonal, quadratic and conic side knowledge

Tulabandhula

Rudin

2014

Mach Learn

View full text Add to dashboard Cite

In this paper, we consider a supervised learning setting where side knowledge is provided about the labels of unlabeled examples. The side knowledge has the effect of reducing the hypothesis space, leading to tighter generalization bounds, and thus possibly better generalization. We consider several types of side knowledge, the first leading to linear and polygonal constraints on the hypothesis space, the second leading to quadratic constraints, and the last leading to conic constraints. We show how different types of domain knowledge can lead directly to these kinds of side knowledge. We prove bounds on complexity measures of the hypothesis space for quadratic and conic side knowledge, and show that these bounds are tight in a specific sense for the quadratic case.

show abstract

“…They derive upper and lower bounds on the number of required labels based on ε-covers and -packings. Later in 2005, Kääriäinen [13] developed a semi-supervised learning strategy, which can save up to one half of the required labels. These results don't make use of extra assumptions that relate the target concept to the data distribution.…”

Section: Related Workmentioning

confidence: 99%

Supervised Learning and Co-training

Darnstädt

Simon

Szörényi

2011

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Co-training under the Conditional Independence Assumption is among the models which demonstrate how radically the need for labeled data can be reduced if a huge amount of unlabeled data is available. In this paper, we explore how much credit for this saving must be assigned solely to the extra assumptions underlying the Co-training model. To this end, we compute general (almost tight) upper and lower bounds on the sample size needed to achieve the success criterion of PAC-learning in the realizable case within the model of Co-training under the Conditional Independence Assumption in a purely supervised setting. The upper bounds lie significantly below the lower bounds for PAC-learning without Co-training. Thus, Co-training saves labeled data even when not combined with unlabeled data. On the other hand, the saving is much less radical than the known savings in the semi-supervised setting.

show abstract

Generalization Error Bounds Using Unlabeled Data

Cited by 34 publications

References 14 publications

A discriminative model for semi-supervised learning

A discriminative model for semi-supervised learning

Generalization bounds for learning with linear, polygonal, quadratic and conic side knowledge

Supervised Learning and Co-training

Contact Info

Product

Resources

About