2014
DOI: 10.1007/978-3-662-44845-8_5
|View full text |Cite
|
Sign up to set email alerts
|

Statistical Hypothesis Testing in Positive Unlabelled Data

Abstract: Abstract. We propose a set of novel methodologies which enable valid statistical hypothesis testing when we have only positive and unlabelled (PU) examples. This type of problem, a special case of semi-supervised data, is common in text mining, bioinformatics, and computer vision. Focusing on a generalised likelihood ratio test, we have 3 key contributions:(1) a proof that assuming all unlabelled examples are negative cases is sufficient for independence testing, but not for power analysis activities; (2) a ne… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 10 publications
(11 citation statements)
references
References 13 publications
0
11
0
Order By: Relevance
“…However, the power of the test differs with a constant correction factor 1−α α Pr(s=0) 1−Pr(s=0) . Because the correction factor is a constant that depends on the amount of labeled data, one can calculate how much more data is required to get the desired power [90]. The conditional test of independence, which was used for learning the PTAN trees, has similar properties [9,88].…”
Section: Hypothesis Testingmentioning
confidence: 99%
“…However, the power of the test differs with a constant correction factor 1−α α Pr(s=0) 1−Pr(s=0) . Because the correction factor is a constant that depends on the amount of labeled data, one can calculate how much more data is required to get the desired power [90]. The conditional test of independence, which was used for learning the PTAN trees, has similar properties [9,88].…”
Section: Hypothesis Testingmentioning
confidence: 99%
“…3.4 and 5 in Sechidis and Brown (2015), while parts of Sect. 3.3 in Sechidis et al (2014). Those two previous works focused only on feature selection through hypothesis testing.…”
Section: Results On Semi-supervised Feature Rankingmentioning
confidence: 99%
“…number of labelled examples) needed, following the same procedure as in sample size determination. In our previous work (Sechidis et al 2014), we presented a complete methodology for sample/labelled size determination in positive-unlabelled scenarios by using the κ Y 0 correction factor and surrogate Y 0 .…”
Section: Theorem 4 (Mar-c: Informed Surrogate Approaches) In Mar-c Onmentioning
confidence: 99%
See 1 more Smart Citation
“…Building upon this assumption, Sechidis et al [19] proved that we can test independence between a feature X and the unobservable variable Y, by simply testing the independence between X and the observable variable S P , which can be seen as a surrogate version of Y. While this assumption is sufficient for testing independence and guarantees the same probability of false positives, it leads to a less powerful test, and the probability of committing a false negative error is increased by a factor which can be calculated using prior knowledge over p(y + ).…”
Section: Positive-unlabelled Datamentioning
confidence: 98%