Jessa Bekker scite author profile

Learning from positive and unlabeled data or PU learning is the setting where a learner only has access to positive examples and unlabeled data. The assumption is that the unlabeled data can contain both positive and negative examples. This setting has attracted increasing interest within the machine learning literature as this type of data naturally arises in applications such as medical diagnosis and knowledge base completion. This article provides a survey of the current state of the art in PU learning. It proposes seven key research questions that commonly arise in this field and provides a broad overview of how the field has tried to address them.

show abstract

Beyond the Selected Completely at Random Assumption for Learning from Positive and Unlabeled Data

Bekker

Robberechts

Davis

2020

View full text Add to dashboard Cite

Most positive and unlabeled data is subject to selection biases. The labeled examples can, for example, be selected from the positive set because they are easier to obtain or more obviously positive. This paper investigates how learning can be enabled in this setting. We propose and theoretically analyze an empirical-risk-based method for incorporating the labeling mechanism. Additionally, we investigate under which assumptions learning is possible when the labeling mechanism is not fully understood and propose a practical method to enable this. Our empirical analysis supports the theoretical results and shows that taking into account the possibility of a selection bias, even when the labeling mechanism is unknown, improves the trained classifiers.

show abstract

Estimating the Class Prior in Positive and Unlabeled Data Through Decision Tree Induction

Bekker

Davis

2018

AAAI

View full text Add to dashboard Cite

For tasks such as medical diagnosis and knowledge base completion, a classifier may only have access to positive and unlabeled examples, where the unlabeled data consists of both positive and negative examples. One way that enables learning from this type of data is knowing the true class prior. In this paper, we propose a simple yet effective method for estimating the class prior, by estimating the probability that a positive example is selected to be labeled. Our key insight is that subdomains of the data give a lower bound on this probability. This lower bound gets closer to the real probability as the ratio of labeled examples increases. Finding such subsets can naturally be done via top-down decision tree induction. Experiments show that our method makes estimates which are equivalently accurate as those of the state of the art methods, and is an order of magnitude faster.

show abstract

Beyond the Selected Completely At Random Assumption for Learning from Positive and Unlabeled Data

Bekker¹,

Robberechts²,

Davis³

2018

Preprint

View full text Add to dashboard Cite

show abstract

Positive and Unlabeled Relational Classification Through Label Frequency Estimation

Bekker

Davis

2018

View full text Add to dashboard Cite

Many applications, such as knowledge base completion and patient data, only have access to positive examples but lack negative examples which are required by standard ILP techniques and suffer under the closed-world assumption. The corresponding propositional problem is known as Positive and Unlabeled (PU) learning. In this field, it is known that using the label frequency (the fraction of true positive examples that are labeled) makes learning easier. This notion has not been explored yet in the relational domain. The goal of this work is twofold: 1) to explore if using the label frequency would also be useful when working with relational data and 2) to propose a method for estimating the label frequency from relational PU data. Our experiments confirm the usefulness of knowing the label frequency and of our estimate.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jessa Bekker

Learning from positive and unlabeled data: a survey

Beyond the Selected Completely at Random Assumption for Learning from Positive and Unlabeled Data

Estimating the Class Prior in Positive and Unlabeled Data Through Decision Tree Induction

Beyond the Selected Completely At Random Assumption for Learning from Positive and Unlabeled Data

Positive and Unlabeled Relational Classification Through Label Frequency Estimation

Contact Info

Product

Resources

About