“…The Upper Confidence Bound (UCB) algorithm and its variants have proven their strength in tackling multi-armed bandit problems (e.g., see [5], [6]). Various generalizations of the classical bandit problem have been studied, in which nonstationary reward functions [7], [8], restless arms [9], satisficing reward objectives [10], risk-averse decision-makers [11], heavy-tailed reward distributions [12], and multiple players [13] are considered. Recently, increasing attention has been also paid to tackling bandit problems in a distributed fashion (e.g., see [14]- [17]).…”