Avrim Blum scite author profile

We consider the problem of using a large unlabeled sample to boost performance of a learning algorithm when only a small set of labeled examples is available. In particular, we consider a setting in which the description of each example can be partitioned into two distinct views, motivated by the task of learning to classify web pages. For example, the description of a web page can be partitioned into the words occurring on that page, and the words occurring in hyperlinks that point to that page. We assume that either view of the example would be su cient for learning if we had enough labeled data, but our goal is to use both views together to allow inexpensive unlabeled data to augment a m uch smaller set of labeled examples. Speci cally, the presence of two distinct views of each example suggests strategies in which t wo learning algorithms are trained separately on each view, and then each algorithm's predictions on new unlabeled examples are used to enlarge the training set of the other. Our goal in this paper is to provide a PAC-style analysis for this setting, and, more broadly, a P AC-style framework for the general problem of learning from both labeled and unlabeled data. We also provide empirical results on real web-page data indicating that this use of unlabeled examples can lead to signi cant improvement o f h ypotheses in practice. As part of our analysis, we p r o vide new reThis is an extended version of a paper that appeared in the

show abstract

Selection of relevant features and examples in machine learning

Blum

Langley

1997

Artificial Intelligence

2,649

1,314

View full text Add to dashboard Cite

In this survey, we review work in machine learning on methods for handling data sets containing large amounts of irrelevant information. We focus on two k ey issues: the problem of selecting relevant features, and the problem of selecting relevant examples. We describe the advances that have been made on these topics in both empirical and theoretical work in machine learning, and we present a general framework that we use to compare di erent methods. We close with some challenges for future work in this area.

show abstract

Correlation Clustering

2004

View full text Add to dashboard Cite

show abstract

Fast planning through planning graph analysis

Blum

Furst

1997

Artificial Intelligence

1,016

543

View full text Add to dashboard Cite

A learning theory approach to non-interactive database privacy

2008

View full text Add to dashboard Cite

We demonstrate that, ignoring computational constraints, it is possible to release privacy-preserving databases that are useful for all queries over a discretized domain from any given concept class with polynomial VC-dimension. We show a new lower bound for releasing databases that are useful for halfspace queries over a continuous domain. Despite this, we give a privacy-preserving polynomial time algorithm that releases information useful for all halfspace queries, for a slightly relaxed definition of usefulness. Inspired by learning theory, we introduce a new notion of data privacy, which we call distributional privacy, and show that it is strictly stronger than the prevailing privacy notion, differential privacy.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Avrim Blum

Combining labeled and unlabeled data with co-training

Selection of relevant features and examples in machine learning

Correlation Clustering

Fast planning through planning graph analysis

A learning theory approach to non-interactive database privacy

Contact Info

Product

Resources

About