“…Distributed MABs, which are extensions of basic MABs, have been studied extensively recently in different settings [6,15,19,20,27,28,31,33,36]. Distributed bandits is well motivated by a broad range application scenarios such as (1) large-scale learning systems [13], in domains such as online advertising and recommendation systems; (2) cooperative search by multiple robots [18,25]; (3) applications in wireless cognitive radio [5,12,26,27]; and distributed learning in geographically distributed communication systems, such as a set of IoT devices learning about the underlying environments [3,7,14,29,37]. Most prior work on multi-agent MABs assume that agents are homogeneous: all agents have full access to the set of all arms, and hence they solve the same instance of a MAB problem, with the aim to minimize the aggregate regret of the agents either in a competition setting [2,5,6,8,9,12,26,27,36], i.e., degraded or no-reward when multiple agents pull the same arm, or in a collaboration/cooperation setting [20,22,23,28,31,36], where agents pulling the same arm observe independent rewards, and agents can communicate their observations to each other in order to improve their learning performance.…”