2005
DOI: 10.1007/11503415_9
|View full text |Cite
|
Sign up to set email alerts
|

Generalization Error Bounds Using Unlabeled Data

Abstract: Abstract. We present two new methods for obtaining generalization error bounds in a semi-supervised setting. Both methods are based on approximating the disagreement probability of pairs of classifiers using unlabeled data. The first method works in the realizable case. It suggests how the ERM principle can be refined using unlabeled data and has provable optimality guarantees when the number of unlabeled examples is large. Furthermore, the technique extends easily to cover active learning. A downside is that … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
31
0
1

Year Published

2005
2005
2019
2019

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 34 publications
(33 citation statements)
references
References 14 publications
1
31
0
1
Order By: Relevance
“…For instance, in the PAC model, it is perfectly natural (and common) to talk about the problem of learning a concept class such as DNF formulas [Linial et al 1989;Verbeurgt 1990] or an intersection of halfspaces [Baum 1990;Blum and Kannan 1997;Vempala 1997;Klivans et al 2002] over the uniform distribution; but clearly in this case unlabeled data is useless -you can just generate it yourself. For learning over an unknown distribution, unlabeled data can help somewhat in the standard models (e.g., by allowing one to use distribution-specific algorithms and sample-complexity bounds [Benedek and Itai 1991;Kaariainen 2005]), but this does not seem to capture the power of unlabeled data in practical semi-supervised learning methods.…”
Section: ·mentioning
confidence: 99%
See 1 more Smart Citation
“…For instance, in the PAC model, it is perfectly natural (and common) to talk about the problem of learning a concept class such as DNF formulas [Linial et al 1989;Verbeurgt 1990] or an intersection of halfspaces [Baum 1990;Blum and Kannan 1997;Vempala 1997;Klivans et al 2002] over the uniform distribution; but clearly in this case unlabeled data is useless -you can just generate it yourself. For learning over an unknown distribution, unlabeled data can help somewhat in the standard models (e.g., by allowing one to use distribution-specific algorithms and sample-complexity bounds [Benedek and Itai 1991;Kaariainen 2005]), but this does not seem to capture the power of unlabeled data in practical semi-supervised learning methods.…”
Section: ·mentioning
confidence: 99%
“…It is well known that when learning under an unknown distribution, unlabeled data might help somewhat even in the standard discriminative models by allowing one to use both distribution-specific algorithms [Benedek and Itai 1991], [Kaariainen 2005], [Sokolovska et al 2008] and/or tighter data dependent sample-complexity bounds [Bartlett and Mendelson 2002;Koltchinskii 2001]. However in all these methods one chooses a class of functions or a prior over functions before performing the inference.…”
Section: Relationship To Other Ways Of Using Unlabeled Data For Learningmentioning
confidence: 99%
“…In a different framework, that of Valiant's PAC learning, there are concentration statements about the risks in the presence of unlabeled examples (Balcan and Blum 2005;Kääriäinen 2005), though in these results, the unlabeled points are used in a very different way than in our work. Specifically, in the work of Balcan and Blum (2005), the authors introduce the notion of incompatibility E x∼D [1 − χ(h, x)] between a function h and the input distribution D. The unlabeled examples are used to estimate the distribution dependent quantity E x∼D [1 − χ(h, x)].…”
Section: Related Workmentioning
confidence: 82%
“…In the work of Kääriäinen (2005), the author obtains a generalization bound by approximating the disagreement probability of pairs of classifiers using unlabeled data. Again, here the unlabeled data is used to estimate a distribution dependent quantity, namely, the true disagreement probability between consistent models.…”
Section: Theorem 8 (Theorem 1 Of Balcan and Blum 2005) If We See M Unmentioning
confidence: 99%
“…They derive upper and lower bounds on the number of required labels based on ε-covers and -packings. Later in 2005, Kääriäinen [13] developed a semi-supervised learning strategy, which can save up to one half of the required labels. These results don't make use of extra assumptions that relate the target concept to the data distribution.…”
Section: Related Workmentioning
confidence: 99%