Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing 2020
DOI: 10.1145/3357713.3384337
|View full text |Cite
|
Sign up to set email alerts
|

Efficiently learning structured distributions from untrusted batches

Abstract: We study the problem, introduced by Qiao and Valiant, of learning from untrusted batches. Here, we assume m users, all of whom have samples from some underlying distribution p over 1,. .. ,n. Each user sends a batch of k i.i.d. samples from this distribution; however an ϵ-fraction of users are untrustworthy and can send adversarially chosen responses. The goal of the algorithm is to learn p in total variation distance. When k = 1 this is the standard robust univariate density estimation setting and it is well-… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 59 publications
0
4
0
Order By: Relevance
“…Our setting is related to the untrusted batches setting of Qiao and Valiant [QV18], in which many batches of samples are drawn from a distribution, but a constant fraction of batches may be adversarially corrupted, see also followup works by Jain and Orlitsky [JO20b, JO20a, JO21] and Chen, Li, and Moitra [CLM20a,CLM20b]. This is somewhat similar to our setting, where each batch is the set of edges connected to a node.…”
Section: Related Workmentioning
confidence: 97%
“…Our setting is related to the untrusted batches setting of Qiao and Valiant [QV18], in which many batches of samples are drawn from a distribution, but a constant fraction of batches may be adversarially corrupted, see also followup works by Jain and Orlitsky [JO20b, JO20a, JO21] and Chen, Li, and Moitra [CLM20a,CLM20b]. This is somewhat similar to our setting, where each batch is the set of edges connected to a node.…”
Section: Related Workmentioning
confidence: 97%
“…More precisely, they show that with no privacy but under contamination, the minimax risk of estimation under 1 loss from n batches of size k and adver-sarial corruption on the batches scales as d N + √ k , where N = nk. Qiao and Valiant (2017) both provide an information theoretic lower bound and a minimax optimal algorithm, unfortunately running in exponential time in either k or d. Polynomial-time algorithms were later proposed by Chen et al (2020), Jain and Orlitsky (2020) and were shown to reach the information theoretic lower bound up to an extra log( 1 ) factor. In this specific setting, it is not known if this extra factor represents a computational gap between polynomial-time and exponential-time algorithms.…”
Section: Related Workmentioning
confidence: 99%
“…In [26] a similar route is followed, but the proposed algorithm requires access to a reference set that is guaranteed to be free of data manipulations. In [8,21,40] robust multisource learning is addressed using tools from robust statistics in the context of discrete density estimation. Note that all of these works are tailored to the task of ensure the accuracy of the learned classifiers or estimator, though.…”
Section: Multisource Learningmentioning
confidence: 99%