Recovering the Propensity Score from Biased Positive Unlabeled Data

Gerych, Walter; Hartvigsen, Thomas; Buquicchio, Luke; Agu, Emmanuel; Rundensteiner, Elke A.

doi:10.1609/aaai.v36i6.20624

Cited by 8 publications

(6 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We also note that our proposed method and SAR-EM method are similar in that both are Empirical Risk Minimization methods and thus it is especially worthwhile to compare their performance. In case of SAR-EM, we use the implementation provided by the authors 6 . In order for the algorithm to work, it needs a list of data attributes which are potential propensity features -in our case, all attributes will be considered as such.…”

Section: Methodsmentioning

confidence: 99%

“…Although majority of research focuses on inferential approaches for PU data when SCAR assumption is valid (see e.g. [2] for a review) some methods have been developed already which account for the more realistic scenario of biased selection of labeled items [7,3,6]. This is usually done attempting to learn a propensity score, defined as a probability of an item from a positive class being labeled given its feature vector x.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

One-Class Classification Approach to Variational Learning from Biased Positive Unlabeled Data

Mielniczuk,

Wawrzeńczyk

2023

Frontiers in Artificial Intelligence and Applications

View full text Add to dashboard Cite

We discuss Empirical Risk Minimization approach in conjunction with one-class classification method to learn classifiers for biased Positive Unlabeled (PU) data. For such data, probability that an observation from a positive class is labeled may depend on its features. The proposed method extends Variational Autoencoder for PU data (VAE-PU) introduced in [16] by proposing another estimator of a theoretical risk of a classifier to be minimized, which has important advantages over the previous proposal. This is based on one-class classification approach using generated pseudo-observations, which turns out to be an effective method of detecting positive observations among unlabeled ones. The proposed method leads to more precise estimation of the theoretical risk than the previous proposal. Experiments performed on real data sets show that the proposed VAE-PU+OCC algorithm works very promisingly in comparison to its competitors such as the original VAE-PU, SAR-EM and LBE methods in terms of accuracy and F1 score. The advantage is especially strongly pronounced for small labeling frequencies.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

One-Class Classification Approach to Variational Learning from Biased Positive Unlabeled Data

Mielniczuk,

Wawrzeńczyk

2023

Frontiers in Artificial Intelligence and Applications

View full text Add to dashboard Cite

show abstract

“…For this reason, weakly supervised learning (WSL) has been increasingly explored in the last decades, which allows training a classifier from less costly data. The previous work based on WSL, including but not limited to, semi-supervised learning (Chapelle, Schölkopf, and Zien 2006;Tarvainen and Valpola 2017;Miyato et al 2018;Oliver et al 2018;Izmailov et al 2020;Lucas, Weinzaepfel, and Rogez 2022), noisy-label learning (Menon et al 2015;Ghosh, Kumar, and Sastry 2017;Ma et al 2018;Kim et al 2019;Charoenphakdee, Lee, and Sugiyama 2019;Wang et al 2019;Han et al 2020), partial-label learning (Tang and Zhang 2017;Xie and Huang 2018;Wu and Zhang 2019;Lv et al 2020;Zhang et al 2021;Gong, Yuan, and Bao 2022), unlabeled-unlabeled learning (Du Plessis, Niu, and Sugiyama 2013;Golovnev, Pál, and Szorenyi 2019) and positive-unlabeled learning Sugiyama 2014, 2015;Sakai, Niu, and Sugiyama 2018;Chapel, Alaya, and Gasso 2020;Hu et al 2021;Su, Chen, and Xu 2021;Gerych et al 2022).…”

Section: Introductionmentioning

confidence: 99%

Class-Imbalanced Complementary-Label Learning via Weighted Loss

Wang¹,

Zhou²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

Complementary-label learning (CLL) is a common application in the scenario of weak supervision. However, in real-world datasets, CLL encounters class-imbalanced training samples, where the quantity of samples of one class is significantly lower than those of other classes. Unfortunately, existing CLL approaches have yet to explore the problem of class-imbalanced samples, which reduces the prediction accuracy, especially in imbalanced classes. In this paper, we propose a novel problem setting to allow learning from class-imbalanced complementarily labeled samples for multi-class classification. Accordingly, to deal with this novel problem, we propose a new CLL approach, called Weighted Complementary-Label Learning (WCLL). The proposed method models a weighted empirical risk minimization loss by utilizing the class-imbalanced complementarily labeled information, which is also applicable to multiclass imbalanced training samples. Furthermore, the estimation error bound of the proposed method was derived to provide a theoretical guarantee. Finally, we do extensive experiments on widely-used benchmark datasets to validate the superiority of our method by comparing it with existing stateof-the-art methods.

show abstract

“…For SCAR method we use TICE algorithm [14] and scale the outpunt of the naive classifer, for SAR we used LBE method [19]. A much more realistic assumption is SAR (Selected at Random), which states that the propensity score function depends solely on the observed feature vector [17,18,19,3,20,21]. Figure 1 shows the difference between SCAR and SAR assumptions for artificially generated two-dimensional data.…”

Section: Introductionmentioning

confidence: 99%

“…However SAR based algorithms are usually computationally more expensive as they require challenging estimation of the propensity score. An exception is the situation when we consider assumptions that are special cases of SAR, such as Probabilistic Gap Assumption [18], invariance of order assumption [22] or impose some additional assumptions such as knowledge of prior probability of positive class [23]. Most existing SAR algorithms are based on the alternating fitting of two models: one is related to the posterior probability of the true class variable, and the other is related to the propensity score [17,19,20].…”

Section: Introductionmentioning

confidence: 99%

Beyond the Selected Completely at Random Assumption for Learning from Positive and Unlabeled Data

Bekker

Robberechts

Davis

2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Most positive and unlabeled data is subject to selection biases. The labeled examples can, for example, be selected from the positive set because they are easier to obtain or more obviously positive. This paper investigates how learning can be enabled in this setting. We propose and theoretically analyze an empirical-risk-based method for incorporating the labeling mechanism. Additionally, we investigate under which assumptions learning is possible when the labeling mechanism is not fully understood and propose a practical method to enable this. Our empirical analysis supports the theoretical results and shows that taking into account the possibility of a selection bias, even when the labeling mechanism is unknown, improves the trained classifiers.

show abstract

Recovering the Propensity Score from Biased Positive Unlabeled Data

Cited by 8 publications

References 19 publications

One-Class Classification Approach to Variational Learning from Biased Positive Unlabeled Data

One-Class Classification Approach to Variational Learning from Biased Positive Unlabeled Data

Class-Imbalanced Complementary-Label Learning via Weighted Loss

Beyond the Selected Completely at Random Assumption for Learning from Positive and Unlabeled Data

Contact Info

Product

Resources

About