2020
DOI: 10.48550/arxiv.2001.05497
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Noise-tolerant, Reliable Active Classification with Comparison Queries

Abstract: With the explosion of massive, widely available unlabeled data in the past years, finding label and time efficient, robust learning algorithms has become ever more important in theory and in practice. We study the paradigm of active learning, in which algorithms with access to large pools of data may adaptively choose what samples to label in the hope of exponentially increasing efficiency. By introducing comparisons, an additional type of query comparing two points, we provide the first time and query efficie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
5
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 12 publications
0
5
0
Order By: Relevance
“…It would also be interesting to investigate whether our algorithmic insights can find applications for learning halfspaces under the challenging Tsybakov noise model (Hanneke, 2011). Finally, it would be interesting to extend our ideas to actively learn more general classes such as low degree polynomials, perhaps using additional comparison queries as explored in recent works (Kane et al, 2017;Xu et al, 2017;Hopkins et al, 2020).…”
Section: Conclusion and Discussionmentioning
confidence: 95%
“…It would also be interesting to investigate whether our algorithmic insights can find applications for learning halfspaces under the challenging Tsybakov noise model (Hanneke, 2011). Finally, it would be interesting to extend our ideas to actively learn more general classes such as low degree polynomials, perhaps using additional comparison queries as explored in recent works (Kane et al, 2017;Xu et al, 2017;Hopkins et al, 2020).…”
Section: Conclusion and Discussionmentioning
confidence: 95%
“…Comparison queries consider four records (say 𝑣 1 , 𝑣 2 , 𝑣 3 , 𝑣 4 ) as input and compare the relative distance between (𝑣 1 , 𝑣 2 ) with that of (𝑣 3 , 𝑣 4 ). Such queries have been used to study correlation clustering [5,59], classification [38,57], top-𝑘 selection [13,17,19,34,43,44,54,61], skyline computation [62] and many other machine learning tasks. Many empirical crowdsourcing studies have shown the ability of crowd members to answer such queries accurately [5].…”
Section: Related Workmentioning
confidence: 99%
“…Such comparisons reveal the local hierarchical structure with respect to the queried records and can be answered without the knowledge of other records in the dataset. These oracle models have been widely popular to study fairness metrics [40], correlation clustering [59] and classification [38,57], identify maximum elements [34,61], top-𝑘 elements [13,17,19,43,44,54], information retrieval [42], skyline computation [62], and so on. In order to minimize the oracle workload, our framework prioritizes records to optimize the number of triplet comparisons.…”
mentioning
confidence: 99%
“…Distance based comparison oracles have been used to study a wide range of problems and we list a few of them -learning fairness metrics [34], top-down hierarchical clustering with a different objective [11,17,24], correlation clustering [49] and classification [32,48], identify maximum [30,53], top-𝑘 elements [14-16, 38, 40, 45], information retrieval [35], skyline computation [54]. To the best of our knowledge, there is no work that considers quadruplet comparison oracle queries to perform 𝑘-center clustering and single/complete linkage based hierarchical clustering.…”
Section: Other Related Workmentioning
confidence: 99%
“…Motivated by the aforementioned observations, we consider a quadruplet comparison oracle that compares the relative distance between two pairs of points (𝑢 1 , 𝑢 2 ) and (𝑣 1 , 𝑣 2 ) and outputs the pair with smaller distance between them breaking ties arbitrarily. Such oracle models have been studied extensively in the literature [11,17,24,32,34,48,49]. Even though quadruplet queries are easier than binary optimal queries, some oracle queries maybe harder than the rest.…”
Section: Introductionmentioning
confidence: 99%