Efficiently learning structured distributions from untrusted batches

Chen, Sitan; Li, Jerry; Moitra, Ankur

doi:10.1145/3357713.3384337

Cited by 9 publications

(4 citation statements)

References 59 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our setting is related to the untrusted batches setting of Qiao and Valiant [QV18], in which many batches of samples are drawn from a distribution, but a constant fraction of batches may be adversarially corrupted, see also followup works by Jain and Orlitsky [JO20b, JO20a, JO21] and Chen, Li, and Moitra [CLM20a,CLM20b]. This is somewhat similar to our setting, where each batch is the set of edges connected to a node.…”

Section: Related Workmentioning

confidence: 97%

Robust Estimation for Random Graphs

Acharya

Jain

Kamath

et al. 2021

Preprint

View full text Add to dashboard Cite

We study the problem of robustly estimating the parameter p of an Erdős-Rényi random graph on n nodes, where a γ fraction of nodes may be adversarially corrupted. After showing the deficiencies of canonical estimators, we design a computationally-efficient spectral algorithm which estimates p up to accuracy Õ( p(1 − p)/n + γ p(1 − p)/√ n + γ/n) for γ < 1/60. Furthermore, we give an inefficient algorithm with similar accuracy for all γ < 1/2, the information-theoretic limit. Finally, we prove a nearly-matching statistical lower bound, showing that the error of our algorithms is optimal up to logarithmic factors.

show abstract

Section: Related Workmentioning

confidence: 97%

Robust Estimation for Random Graphs

Acharya

Jain

Kamath

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…More precisely, they show that with no privacy but under contamination, the minimax risk of estimation under 1 loss from n batches of size k and adver-sarial corruption on the batches scales as d N + √ k , where N = nk. Qiao and Valiant (2017) both provide an information theoretic lower bound and a minimax optimal algorithm, unfortunately running in exponential time in either k or d. Polynomial-time algorithms were later proposed by Chen et al (2020), Jain and Orlitsky (2020) and were shown to reach the information theoretic lower bound up to an extra log( 1 ) factor. In this specific setting, it is not known if this extra factor represents a computational gap between polynomial-time and exponential-time algorithms.…”

Section: Related Workmentioning

confidence: 99%

Robust Estimation of Discrete Distributions under Local Differential Privacy

Chhor¹,

Sentenac²

2022

Preprint

View full text Add to dashboard Cite

Although robust learning and local differential privacy are both widely studied fields of research, combining the two settings is an almost unexplored topic. We consider the problem of estimating a discrete distribution in total variation from n contaminated data batches under a local differential privacy constraint. A fraction 1 − of the batches contain k i.i.d. samples drawn from a discrete distribution p over d elements. To protect the users' privacy, each of the samples is privatized using an α-locally differentially private mechanism. The remaining n batches are an adversarial contamination. The minimax rate of estimation under contamination alone, with no privacy, is known to be / √ k + d/kn, up to a log(1/ ) factor. Under the privacy constraint alone, the minimax rate of estimation is d 2 /α 2 kn. We show that combining the two constraints leads to a minimax estimation rate of d/α 2 k + d 2 /α 2 kn up to a log(1/ ) factor, larger than the sum of the two separate rates. We provide a polynomial-time algorithm achieving this bound, as well as a matching information theoretic lower bound.

show abstract

“…In [26] a similar route is followed, but the proposed algorithm requires access to a reference set that is guaranteed to be free of data manipulations. In [8,21,40] robust multisource learning is addressed using tools from robust statistics in the context of discrete density estimation. Note that all of these works are tailored to the task of ensure the accuracy of the learned classifiers or estimator, though.…”

Section: Multisource Learningmentioning

confidence: 99%

FLEA: Provably Fair Multisource Learning from Unreliable Training Data

Iofinova¹,

Konstantinov²,

Lampert³

2021

Preprint

View full text Add to dashboard Cite

Fairness-aware learning aims at constructing classifiers that not only make accurate predictions, but do not to discriminate against specific groups. It is a fast-growing area of machine learning with far-reaching societal impact. However, existing fair learning methods are vulnerable to accidental or malicious artifacts in the training data, which can cause them to unknowingly produce unfair classifiers. In this work we address the problem of fair learning from unreliable training data in the robust multisource setting, where the available training data comes from multiple sources, a fraction of which might be not representative of the true data distribution. We introduce FLEA, a filtering-based algorithm that allows the learning system to identify and suppress those data sources that would have a negative impact on fairness or accuracy if they were used for training. We show the effectiveness of our approach by a diverse range of experiments on multiple datasets. Additionally we prove formally that -given enough data-FLEA protects the learner against unreliable data as long as the fraction of affected data sources is less than half.* Equal contribution Preprint. Under review.

show abstract

Efficiently learning structured distributions from untrusted batches

Cited by 9 publications

References 59 publications

Robust Estimation for Random Graphs

Robust Estimation for Random Graphs

Robust Estimation of Discrete Distributions under Local Differential Privacy

FLEA: Provably Fair Multisource Learning from Unreliable Training Data

Contact Info

Product

Resources

About