To study the resilience of distributed learning, the "Byzantine" literature considers a strong threat model where workers can report arbitrary gradients to the parameter server. Whereas this model helped obtain several fundamental results, it has sometimes been considered unrealistic, when the workers are mostly trustworthy machines. In this paper, we show a surprising equivalence between this model and data poisoning, a threat considered much more realistic. More specifically, we prove that every gradient attack can be reduced to data poisoning, in any personalized federated learning system with PAC guarantees (which we show are both desirable and realistic). This equivalence makes it possible to obtain new impossibility results on the resilience to data poisoning as corollaries of existing impossibility theorems on Byzantine machine learning. Moreover, using our equivalence, we derive a practical attack that we show (theoretically and empirically) can be very effective against classical personalized federated learning models.
Many modern Internet applications, like content moderation and recommendation on social media, require reviewing and score a large number of alternatives. In such a context, the voting can only be sparse, as the number of alternatives is too large for any individual to review a significant fraction of all of them. Moreover, in critical applications, malicious players might seek to hack the voting process by entering dishonest reviews or creating fake accounts. Classical voting methods are unfit for this task, as they usually (a) require each reviewer to assess all available alternatives and (b) can be easily manipulated by malicious players.This paper defines precisely the problem of robust sparse voting, highlights its underlying technical challenges, and presents Mehestan, a novel voting mechanism that solves the problem. Namely, we prove that by using Mehestan, no (malicious) voter can have more than a small parametrizable effect on each alternative's score, and we identify conditions of voters comparability under which any unanimous preferences can be recovered, even when these preferences are expressed by voters on very different scales.
Today's large-scale machine learning algorithms harness massive amounts of usergenerated data to train large models. However, especially in the context of content recommendation with enormous social, economical and political incentives to promote specific views, products or ideologies, strategic users might be tempted to fabricate or mislabel data in order to bias algorithms in their favor. Unfortunately, today's learning schemes strongly incentivize such strategic data misreporting. This is a major concern, as it endangers the trustworthiness of the entire training datasets, and questions the safety of any algorithm trained on such datasets. In this paper, we show that, perhaps surprisingly, incentivizing data misreporting is not a fatality. We propose the first personalized collaborative learning framework, LICCHAVI, with provable strategyproofness guarantees through a careful design of the underlying loss function. Interestingly, we also prove that LICCHAVI is Byzantine resilient: it tolerates a minority of users that provide arbitrary data.
The geometric median of a tuple of vectors is the vector that minimizes the sum of Euclidean distances to the vectors of the tuple. Also called the Fermat-Weber problem and often applied to facility location, the geometric median is also an appealing tool for voting in high dimension, which may be applied, e.g., to collaborative content moderation on social medias. Interestingly, the geometric median can also be viewed as the equilibrium of a process where each vector of the tuple pulls on a common decision point with a unitary force towards them, promoting the "one voter, one unit force" fairness principle.In this paper, we analyze the strategyproofness of the geometric median as a voting system. Assuming that voters want to minimize the Euclidean distance between their preferred vector and the outcome of the vote, we first prove that, in the general case, the geometric median is not even α-strategyproof. However, in the limit of a large number of voters, assuming that voters' preferred vectors are drawn i.i.d. from a distribution of preferred vectors, we also prove that the geometric median is asymptotically α-strategyproof. The bound α describes what a voter can gain (at most) by deviating from truthfulness. We show how to compute this bound as a function of the distribution followed by the vectors. We then generalize our results to the case where each voter actually cares more about some dimensions rather than others, and we determine how this impacts the strategyproofness bound α. Roughly, we show that, if some dimensions are more polarized and regarded as more important, then the geometric median becomes less strategyproof. Interestingly, we also show how the skewed geometric medians can be used to improve strategyproofness. Nevertheless, if voters care differently about different dimensions, we prove that no skewed geometric median can achieve strategyproofness for all of them. Overall, our results provide insight into the extent to which the (skewed) geometric median is a suitable approach to aggregate high-dimensional disagreements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.