Daniel Vial scite author profile

Shakkottai

Srikant

2021

There has been recent interest in collaborative multi-agent bandits, where groups of agents share recommendations to decrease per-agent regret. However, these works assume that each agent always recommends their individual best-arm estimates to other agents, which is unrealistic in envisioned applications (machine faults in distributed computing or spam in social recommendation systems). Hence, we generalize the setting to include honest and malicious agents who recommend best-arm estimates and arbitrary arms, respectively. We show that even with a single malicious agent, existing collaboration-based algorithms fail to improve regret guarantees over a single-agent baseline. We propose a scheme where honest agents learn who is malicious and dynamically reduce communication with them, i.e., "blacklist" them. We show that collaboration indeed decreases regret for this algorithm, when the number of malicious agents is small compared to the number of arms, and crucially without assumptions on the malicious agents' behavior. Thus, our algorithm is robust against any malicious recommendation strategy.

Regret Bounds for Stochastic Shortest Path Problems with Linear Function Approximation

Parulekar²,

Shakkottai³

et al. 2021

Preprint

We propose two algorithms for episodic stochastic shortest path problems with linear function approximation. The first is computationally expensive but provably obtains Õ( B 3 d 3 K/c min ) regret, where B is a (known) upper bound on the optimal cost-to-go function, d is the feature dimension, K is the number of episodes, and c min is the minimal cost of non-goal state-action pairs (assumed to be positive). The second is computationally efficient in practice, and we conjecture that it obtains the same regret bound. Both algorithms are based on an optimistic least-squares version of value iteration analogous to the finite-horizon backward induction approach from Jin et al. [2020]. To the best of our knowledge, these are the first regret bounds for stochastic shortest path that are independent of the size of the state and action spaces.

A Structural Result for Personalized PageRank and its Algorithmic Consequences

SIGMETRICS Perform. Eval. Rev.

Subramanian

2019

Many natural and man-made systems can be represented as graphs, sets of objects (called nodes) and pairwise relations between these objects (called edges). These include the brain, which contains neurons (nodes) that exchange signals through chemical pathways (edges), the Internet, which contains websites (nodes) that are connected via hyperlinks (edges), etc. To study graphs, researchers in diverse domains have used Personalized PageRank (PPR) [6]. Informally, PPR assigns to each node v a vector πv , where πv (w) describes the importance or relevance of node w from the perspective ofv. PPR has proven useful in many applications, both practical and graph-theoretic. Examples include recommending who a user should follow on Twitter [7] (v may wish to follow w if πv (w) is large) and local graph partitioning [2] (the set of nodesw with large πv (w) can be viewed as a community surrounding v).

Minimax Regret for Cascading Bandits

Vial¹,

Sanghavi²,

Shakkottai³

et al. 2022

Preprint

Cascading bandits model the task of learning to rank K out of L items over n rounds of partial feedback. For this model, the minimax (i.e., gap-free) regret is poorly understood; in particular, the best known lower and upper bounds are Ω( nL/K) and Õ( √ nLK), respectively. We improve the lower bound to Ω( √ nL) and show CascadeKL-UCB (which ranks items by their KL-UCB indices) attains it up to log terms. Surprisingly, we also show CascadeUCB1 (which ranks via UCB1) can suffer suboptimal Ω( √ nLK) regret. This sharply contrasts with standard L-armed bandits, where the corresponding algorithms both achieve the minimax regret √ nL (up to log terms), and the main advantage of KL-UCB is only to improve constants in the gap-dependent bounds. In essence, this contrast occurs because Pinsker's inequality is tight for hard problems in the L-armed case but loose (by a factor of K) in the cascading case.

Robust Multi-Agent Multi-Armed Bandits

Shakkottai

Srikant

2020

Preprint