A discriminative model for semi-supervised learning

Balcan, Maria-Florina; Blum, Avrim

doi:10.1145/1706591.1706599

Cited by 100 publications

(88 citation statements)

References 63 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast, unlabeled data are usually plentiful and inexpensive to acquire in large quantities. A key discovery has been that, under certain well‐specified assumptions, semi‐supervised models can use the potentially inexpensive unlabeled data to greatly improve classifier performance compared with supervised models alone (Balcan & Blum, ).…”

Section: Psychological and Machine Learning Models Of Categorizationmentioning

confidence: 99%

Human Semi‐Supervised Learning

Gibson

Rogers

Zhu

2013

Topics in Cognitive Science

View full text Add to dashboard Cite

Most empirical work in human categorization has studied learning in either fully supervised or fully unsupervised scenarios. Most real-world learning scenarios, however, are semi-supervised: Learners receive a great deal of unlabeled information from the world, coupled with occasional experiences in which items are directly labeled by a knowledgeable source. A large body of work in machine learning has investigated how learning can exploit both labeled and unlabeled data provided to a learner. Using equivalences between models found in human categorization and machine learning research, we explain how these semi-supervised techniques can be applied to human learning. A series of experiments are described which show that semi-supervised learning models prove useful for explaining human behavior when exposed to both labeled and unlabeled data. We then discuss some machine learning models that do not have familiar human categorization counterparts. Finally, we discuss some challenges yet to be addressed in the use of semisupervised models for modeling human categorization.Keywords: Category learning; Semi-supervised learning; Machine learning Cognitive psychology has long had an interest in understanding human categorization: how we come to conceive of objects in the world as belonging to different categories, and how we use categories to draw inferences about the unobserved properties of objects. Toward this end, one of the most commonly used experimental paradigms has been supervised category learning: On each trial, the participant views a stimulus and must guess to which of a small number of categories it belongs. Feedback is provided that indicates either whether the guess was correct or what the correct answer was-the learning is supervised in this sense. The experimenter then measures how rapidly the participant learns to generate correct inferences about category membership, and how the acquired knowledge generalizes to novel stimuli.Correspondence should be sent to Bryan R. Gibson,

show abstract

Section: Psychological and Machine Learning Models Of Categorizationmentioning

confidence: 99%

Human Semi‐Supervised Learning

Gibson

Rogers

Zhu

2013

Topics in Cognitive Science

View full text Add to dashboard Cite

show abstract

“…Experiments show that in most cases our method outperforms similar methods. Balcan and Blum [27] present a general analysis of SemiSupervised learning with discriminative classifiers (that do not try to model the distribution of the data). They point out that an assumption is required on the relation between the distribution of the data and of the classes.…”

Section: Conclusion and Discussionmentioning

confidence: 99%

Disagreement-Based Co-training

Tanha

Someren²

2011

2011 IEEE 23rd International Conference on Tools With Artificial Intelligence

View full text Add to dashboard Cite

Disagreement-based co-trainingTanha, J.; van Someren, M.W.; Afsarmanesh, H. General rightsIt is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulationsIf you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: http://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. Abstract-Recently, Semi-Supervised learning algorithms such as co-training are used in many domains. In co-training, two classifiers based on different subsets of the features or on different learning algorithms are trained in parallel and unlabeled data that are classified differently by the classifiers but for which one classifier has large confidence are labeled and used as training data for the other. In this paper, a new form of co-training, called Ensemble-Co-Training, is proposed that uses an ensemble of different learning algorithms. Based on a theorem by Angluin and Laird that relates noise in the data to the error of hypotheses learned from these data, we propose a criterion for finding a subset of high-confidence predictions and error rate for a classifier in each iteration of the training process. Experiments show that the new method in almost all domains gives better results than the state-of-the-art methods.

show abstract

“…Algorithms such as manifold, entropy or co-regularization [6,13,18] follow this idea. Our formalization of this idea is inspired by Balcan and Blum [3] and allows for a similar sample complexity analysis.…”

Section: Introductionmentioning

confidence: 99%

“…Section 5 reviews the work from Balcan and Blum [3] and generalizes a sample complexity bound from their paper. We then show how this bound can be used to derive sample complexity bounds for the proposed framework, and thus in particular for MR.…”

Section: Introductionmentioning

confidence: 99%

A Distribution Dependent and Independent Complexity Analysis of Manifold Regularization

Mey

Viering

Loog

2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Manifold regularization is a commonly used technique in semi-supervised learning. It guides the learning process by enforcing that the classification rule we find is smooth with respect to the datamanifold. In this paper we present sample and Rademacher complexity bounds for this method. We first derive distribution independent sample complexity bounds by analyzing the general framework of adding a data dependent regularization term to a supervised learning process. We conclude that for these types of methods one can expect that the sample complexity improves at most by a constant, which depends on the hypothesis class. We then derive Rademacher complexities bounds which allow for a distribution dependent complexity analysis. We illustrate how our bounds can be used for choosing an appropriate manifold regularization parameter. With our proposed procedure there is no need to use an additional labeled validation set.

show abstract

A discriminative model for semi-supervised learning

Cited by 100 publications

References 63 publications

Human Semi‐Supervised Learning

Human Semi‐Supervised Learning

Disagreement-Based Co-training

A Distribution Dependent and Independent Complexity Analysis of Manifold Regularization

Contact Info

Product

Resources

About