2022
DOI: 10.1609/aaai.v36i6.20624
|View full text |Cite
|
Sign up to set email alerts
|

Recovering the Propensity Score from Biased Positive Unlabeled Data

Abstract: Positive-Unlabeled (PU) learning methods train a classifier to distinguish between the positive and negative classes given only positive and unlabeled data. While traditional PU methods require the labeled positive samples to be an unbiased sample of the positive distribution, in practice the labeled sample is often a biased draw from the true distribution. Prior work shows that if we know the likelihood that each positive instance will be selected for labeling, referred to as the propensity score, then the bi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 19 publications
0
6
0
Order By: Relevance
“…We also note that our proposed method and SAR-EM method are similar in that both are Empirical Risk Minimization methods and thus it is especially worthwhile to compare their performance. In case of SAR-EM, we use the implementation provided by the authors 6 . In order for the algorithm to work, it needs a list of data attributes which are potential propensity features -in our case, all attributes will be considered as such.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We also note that our proposed method and SAR-EM method are similar in that both are Empirical Risk Minimization methods and thus it is especially worthwhile to compare their performance. In case of SAR-EM, we use the implementation provided by the authors 6 . In order for the algorithm to work, it needs a list of data attributes which are potential propensity features -in our case, all attributes will be considered as such.…”
Section: Methodsmentioning
confidence: 99%
“…Although majority of research focuses on inferential approaches for PU data when SCAR assumption is valid (see e.g. [2] for a review) some methods have been developed already which account for the more realistic scenario of biased selection of labeled items [7,3,6]. This is usually done attempting to learn a propensity score, defined as a probability of an item from a positive class being labeled given its feature vector x.…”
Section: Introductionmentioning
confidence: 99%
“…For this reason, weakly supervised learning (WSL) has been increasingly explored in the last decades, which allows training a classifier from less costly data. The previous work based on WSL, including but not limited to, semi-supervised learning (Chapelle, Schölkopf, and Zien 2006;Tarvainen and Valpola 2017;Miyato et al 2018;Oliver et al 2018;Izmailov et al 2020;Lucas, Weinzaepfel, and Rogez 2022), noisy-label learning (Menon et al 2015;Ghosh, Kumar, and Sastry 2017;Ma et al 2018;Kim et al 2019;Charoenphakdee, Lee, and Sugiyama 2019;Wang et al 2019;Han et al 2020), partial-label learning (Tang and Zhang 2017;Xie and Huang 2018;Wu and Zhang 2019;Lv et al 2020;Zhang et al 2021;Gong, Yuan, and Bao 2022), unlabeled-unlabeled learning (Du Plessis, Niu, and Sugiyama 2013;Golovnev, Pál, and Szorenyi 2019) and positive-unlabeled learning Sugiyama 2014, 2015;Sakai, Niu, and Sugiyama 2018;Chapel, Alaya, and Gasso 2020;Hu et al 2021;Su, Chen, and Xu 2021;Gerych et al 2022).…”
Section: Introductionmentioning
confidence: 99%
“…For SCAR method we use TICE algorithm [14] and scale the outpunt of the naive classifer, for SAR we used LBE method [19]. A much more realistic assumption is SAR (Selected at Random), which states that the propensity score function depends solely on the observed feature vector [17,18,19,3,20,21]. Figure 1 shows the difference between SCAR and SAR assumptions for artificially generated two-dimensional data.…”
Section: Introductionmentioning
confidence: 99%
“…However SAR based algorithms are usually computationally more expensive as they require challenging estimation of the propensity score. An exception is the situation when we consider assumptions that are special cases of SAR, such as Probabilistic Gap Assumption [18], invariance of order assumption [22] or impose some additional assumptions such as knowledge of prior probability of positive class [23]. Most existing SAR algorithms are based on the alternating fitting of two models: one is related to the posterior probability of the true class variable, and the other is related to the propensity score [17,19,20].…”
Section: Introductionmentioning
confidence: 99%