There has been recent interest in collaborative multi-agent bandits, where groups of agents share recommendations to decrease per-agent regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generalize the setting to include honest and malicious agents who recommend best-arm estimates and arbitrary arms, respectively. We show that even with a single malicious agent, existing collaboration-based algorithms fail to improve regret guarantees over a single-agent baseline. We propose a scheme where honest agents learn who is malicious and dynamically reduce communication with them, i.e., "blacklist" them. We show that collaboration indeed decreases regret for this algorithm, when the number of malicious agents is small compared to the number of arms, and crucially without assumptions on the malicious agents' behavior. Thus, our algorithm is robust against any malicious recommendation strategy.
We propose two algorithms for episodic stochastic shortest path problems with linear function approximation. The first is computationally expensive but provably obtains Õ( B 3 d 3 K/c min ) regret, where B is a (known) upper bound on the optimal cost-to-go function, d is the feature dimension, K is the number of episodes, and c min is the minimal cost of non-goal state-action pairs (assumed to be positive). The second is computationally efficient in practice, and we conjecture that it obtains the same regret bound. Both algorithms are based on an optimistic least-squares version of value iteration analogous to the finite-horizon backward induction approach from Jin et al. [2020]. To the best of our knowledge, these are the first regret bounds for stochastic shortest path that are independent of the size of the state and action spaces.
Many natural and man-made systems can be represented as graphs, sets of objects (called nodes) and pairwise relations between these objects (called edges). These include the brain, which contains neurons (nodes) that exchange signals through chemical pathways (edges), the Internet, which contains websites (nodes) that are connected via hyperlinks (edges), etc. To study graphs, researchers in diverse domains have used Personalized PageRank (PPR) [6]. Informally, PPR assigns to each node v a vector πv , where πv (w) describes the importance or relevance of node w from the perspective ofv. PPR has proven useful in many applications, both practical and graph-theoretic. Examples include recommending who a user should follow on Twitter [7] (v may wish to follow w if πv (w) is large) and local graph partitioning [2] (the set of nodesw with large πv (w) can be viewed as a community surrounding v).
Cascading bandits model the task of learning to rank K out of L items over n rounds of partial feedback. For this model, the minimax (i.e., gap-free) regret is poorly understood; in particular, the best known lower and upper bounds are Ω( nL/K) and Õ( √ nLK), respectively. We improve the lower bound to Ω( √ nL) and show CascadeKL-UCB (which ranks items by their KL-UCB indices) attains it up to log terms. Surprisingly, we also show CascadeUCB1 (which ranks via UCB1) can suffer suboptimal Ω( √ nLK) regret. This sharply contrasts with standard L-armed bandits, where the corresponding algorithms both achieve the minimax regret √ nL (up to log terms), and the main advantage of KL-UCB is only to improve constants in the gap-dependent bounds. In essence, this contrast occurs because Pinsker's inequality is tight for hard problems in the L-armed case but loose (by a factor of K) in the cascading case.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.