Kernel Methods for Cooperative Multi-Agent Contextual Bandits

Dubey, Abhimanyu; Pentland, Alex

doi:10.48550/arxiv.2008.06220

Cited by 3 publications

(4 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition to the theoretical bounds on regret, experiments on both synthetic data and real data also verify the feasibility of the proposed gossiping approach of federated bandit. Future work may include extending this framework to contextual bandits [46] with local features or bandits with continuous arms [53].…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Federated Bandit: A Gossiping Approach

Zhu,

Liu

et al. 2020

Preprint

View full text Add to dashboard Cite

In this paper, we study Federated Bandit, a decentralized Multi-Armed Bandit problem with a set of N agents, who can only communicate their local data with neighbors described by a connected graph G. Each agent makes a sequence of decisions on selecting an arm from M candidates, yet they only have access to local and potentially biased feedback/evaluation of the true reward for each action taken. Learning only locally will lead agents to sub-optimal actions while converging to a no-regret strategy requires a collection of distributed data. Motivated by the proposal of federated learning, we aim for a solution with which agents will never share their local observations with a central entity, and will be allowed to only share a private copy of his/her own information with their neighbors. We first propose a decentralized bandit algorithm Gossip_UCB, which is a coupling of variants of both the classical gossiping algorithm and the celebrated Upper Confidence Bound (UCB) bandit algorithm. We show that Gossip_UCB successfully adapts local bandit learning into a global gossiping process for sharing information among connected agents, and achieves guaranteed regret at the order of O(max{poly(N, M ) log T, poly(N, M ) log λ −1 2 N }) for all N agents, where λ 2 ∈ (0, 1) is the second largest eigenvalue of the expected gossip matrix, which is a function of G. We then propose Fed_UCB, a differentially private version of Gossip_UCB, in which the agents preserve -differential privacy of their local data while achieving O(max{ poly(N,M ) log 2.5 T, poly(N, M )(log λ −1 2 N + log T )}) regret.

show abstract

Section: Discussionmentioning

confidence: 99%

“…A naive agent, which uses a standard centralized bandit algorithm, may not solve the problem without exchanging information with other agents. The heterogeneous reward structure is ready for extension to a contextual case [46] by considering the feature or local feature v [47,48] of each sample, where the regret could be R…”

Section: Problem Formulationmentioning

confidence: 99%

Federated Bandit: A Gossiping Approach

Zhu,

Liu

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…The idea of using kernel mean embeddings (KME) for adaptive domain generalization was proposed in the work of Blanchard et al [7]. Kernel mean embeddings have also been used for personal-ized learning in both multi-task [11] and multi-agent learning [13] bandit problems. A rigorous treatment of domainadaptive generalization in the context of KME approaches is provided in Deshmukh et al [12].…”

Section: Related Workmentioning

confidence: 99%

Adaptive Methods for Real-World Domain Generalization

Dubey¹,

Ramanathan²,

Pentland³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Invariant approaches have been remarkably successful in tackling the problem of domain generalization, where the objective is to perform inference on data distributions different from those used in training. In our work, we investigate whether it is possible to leverage domain information from the unseen test samples themselves. We propose a domain-adaptive approach consisting of two steps: a) we first learn a discriminative domain embedding from unsupervised training examples, and b) use this domain embedding as supplementary information to build a domainadaptive model, that takes both the input as well as its domain into account while making predictions. For unseen domains, our method simply uses few unlabelled test examples to construct the domain embedding. This enables adaptive classification on any unseen domain. Our approach achieves state-of-the-art performance on various domain generalization benchmarks. In addition, we introduce the first real-world, large-scale domain generalization benchmark, Geo-YFCC, containing 1.1M samples over 40 training, 7 validation and 15 test domains, orders of magnitude larger than prior work. We show that the existing approaches either do not scale to this dataset or underperform compared to the simple baseline of training a model on the union of data from all training domains. In contrast, our approach achieves a significant 1% improvement.

show abstract

“…The contextual bandit problem, however, is a very interesting candidate for private methods, since the involved contexts and rewards both typically contain sensitive user information [38]. There is an increasing body of work on online learning and multi-armed bandits in cooperative settings [13,31,39], and private single-agent learning [41,38], but methods for private federated bandit learning are still elusive, despite their immediate applicability.…”

Section: Introductionmentioning

confidence: 99%

Differentially-Private Federated Linear Bandits

Dubey,

Pentland

2020

Preprint

Self Cite

View full text Add to dashboard Cite

The rapid proliferation of decentralized learning systems mandates the need for differentially-private cooperative learning. In this paper, we study this in context of the contextual linear bandit: we consider a collection of agents cooperating to solve a common contextual bandit, while ensuring that their communication remains private. For this problem, we devise FEDUCB, a multiagent private algorithm for both centralized and decentralized (peer-to-peer) federated learning. We provide a rigorous technical analysis of its utility in terms of regret, improving several results in cooperative bandit learning, and provide rigorous privacy guarantees as well. Our algorithms provide competitive performance both in terms of pseudoregret bounds and empirical benchmark performance in various multi-agent settings.1 Originally, federated learning referred to the algorithm proposed in [28] for supervised learning, however, now the term broadly refers to the distributed cooperative learning setting [27].34th Conference on Neural Information Processing Systems (NeurIPS 2020),

show abstract

Kernel Methods for Cooperative Multi-Agent Contextual Bandits

Cited by 3 publications

References 23 publications

Federated Bandit: A Gossiping Approach

Federated Bandit: A Gossiping Approach

Adaptive Methods for Real-World Domain Generalization

Differentially-Private Federated Linear Bandits

Contact Info

Product

Resources

About