“…As a classic model for sequential decision making problems, contextual bandit has been widely used for a variety of real-world applications, including recommender systems (Li et al, 2010a), display advertisement (Li et al, 2010b) and clinical trials (Durand et al, 2018). While most existing bandit solutions are designed under a centralized setting (i.e., data is readily available at a central server), in response to the increasing application scale and public concerns of privacy, there is increasing research effort on federated bandit learning lately Dubey and Pentland, 2020;Shi et al, 2021;Huang et al, 2021;Li and Wang, 2021), where N clients collaborate with limited communication bandwidth to minimize the overall cumulative regret incurred over a finite time horizon T , while keeping each client's raw data local. Compared with standard federated learning (McMahan et al, 2017;Kairouz et al, 2019) that works with fixed datasets, federated bandit learning is characterized by its online interactions with the environment, which continuously provides new data samples to the clients over time.…”