Proceedings of the Eighth Annual Conference on Computational Learning Theory - COLT '95 1995
DOI: 10.1145/225298.225348
|View full text |Cite
|
Sign up to set email alerts
|

Learning from a mixture of labeled and unlabeled examples with parametric side information

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
40
0
1

Year Published

1997
1997
2017
2017

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 65 publications
(48 citation statements)
references
References 9 publications
1
40
0
1
Order By: Relevance
“…As mentioned in Section 1, a typical assumption in a generative setting is that D is a mixture with the probability density function p(x|θ) = p 0 · p 0 (x|θ 0 ) + p 1 · p 1 (x|θ 1 ) (see for instance [Ratsaby and Venkatesh 1995;Castelli and Cover 1995;). In other words, the labeled examples are generated according to the following · 29 mechanism: a label y ∈ {0, 1} is drawn according to the distribution of classes {p 0 , p 1 } and then a corresponding random feature vector is drawn according to the class-conditional density p y .…”
Section: Connections To Generative Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…As mentioned in Section 1, a typical assumption in a generative setting is that D is a mixture with the probability density function p(x|θ) = p 0 · p 0 (x|θ 0 ) + p 1 · p 1 (x|θ 1 ) (see for instance [Ratsaby and Venkatesh 1995;Castelli and Cover 1995;). In other words, the labeled examples are generated according to the following · 29 mechanism: a label y ∈ {0, 1} is drawn according to the distribution of classes {p 0 , p 1 } and then a corresponding random feature vector is drawn according to the class-conditional density p y .…”
Section: Connections To Generative Modelsmentioning
confidence: 99%
“…Essentially once the decision border is estimated, a small labeled sample suffices to learn (with high confidence and small error) the appropriate class labels associated with the two disjoint regions generated by the estimate of the Bayes decision border. To see how we can incorporate this setting in our model, consider for illustration the setting in [Ratsaby and Venkatesh 1995]; there they assume that p 0 = p 1 , and that the class conditional densities are d-dimensional Gaussians with unit covariance and unknown mean vectors θ i ∈ R d . The algorithm used is the following: the unknown parameter vector θ = (θ 0 , θ 1 ) is estimated from unlabeled data using a maximum likelihood estimate; this determines a hypothesis which is a linear separator that passes through the point (θ 0 +θ 1 )/2 and is orthogonal to the vectorθ 1 −θ 0 ; finally each of the two decision regions separated by the hyperplane is labeled according to the majority of the labeled examples in the region.…”
Section: Connections To Generative Modelsmentioning
confidence: 99%
See 1 more Smart Citation
“…For instance, we could search for pages pointed to with links having the phrase \my advisor" and use them as \probably positive" examples to further train a learning algorithm based on the words on the text page, and vice-versa. We call this type of bootstrapping cotraining, and it has a close connection to bootstrapping from incomplete data in the Expectation-Maximization setting see, for instance, 7,15]. The question this raises is: is there any reason to believe co-training will help?…”
Section: Introductionmentioning
confidence: 99%
“…A substantial body of literature has investigated unlabeled data in the context of supervised learning, although not in the same way we have considered in this paper. Most work in this area adopts the perspective of parametric probability modeling and uses unlabeled data as part of a maximum likelihood (EM) or discriminative training procedure (Miller & Uyar, 1997;Castelli & Cover, 1996;Ratsaby & Venkatesh, 1995;Gutfinger & Sklansky, 1991;O'Neill, 1978). Another common idea is to supply artificial labels to unlabeled examples and use this data directly in a supervised training procedure (Blum & Mitchell, 1998;Towell, 1996).…”
Section: Resultsmentioning
confidence: 99%