We introduce a number of privacy definitions for the multi-armed bandit problem, based on differential privacy. We relate them through a unifying graphical model representation and connect them to existing definitions. We then derive and contrast lower bounds on the regret of bandit algorithms satisfying these definitions. We show that for all of them, the learner's regret is increased by a multiplicative factor dependent on the privacy level ǫ, but that the dependency is weaker when we do not require local differential privacy for the rewards.
We study model-based reinforcement learning in an unknown finite communicating Markov decision process. We propose a simple algorithm that leverages a variance based confidence interval. We show that the proposed algorithm, UCRL-V, achieves the optimal regret Õ( √ DSAT ) up to logarithmic factors, and so our work closes a gap with the lower bound without additional assumptions on the MDP. We perform experiments in a variety of environments that validates the theoretical bounds as well as prove UCRL-V to be better than the state-of-the-art algorithms.
We present differentially private algorithms for the stochastic Multi-Armed Bandit (MAB) problem. This is a problem for applications such as adaptive clinical trials, experiment design, and user-targeted advertising where private information is connected to individual rewards. Our major contribution is to show that there exist (ǫ, δ) differentially private variants of Upper Confidence Bound algorithms which have optimal regret, O(ǫ −1 + log T ). This is a significant improvement over previous results, which only achieve poly-log regret O(ǫ −2 log 2 T ), because of our use of a novel intervalbased mechanism. We also substantially improve the bounds of previous family of algorithms which use a continual release mechanism. Experiments clearly validate our theoretical bounds.
Smartphones are a key enabling technology in the Internet of Things (IoT) for gathering crowd-sensed data. However, collecting crowd-sensed data for research is not simple. Issues related to device heterogeneity, security, and privacy have prevented the rise of crowd-sensing platforms for scientific data collection. For this reason, we implemented VIVO, an open framework for gathering crowdsensed Big Data for IoT services, where security and privacy are managed within the framework. VIVO introduces the enrolled crowd-sensing model, which allows the deployment of multiple simultaneous experiments on the mobile phones of volunteers. The collected data can be accessed both at the end of the experiment, as in traditional testbeds, as well as in real-time, as required by many Big Data applications. We present here the VIVO architecture, highlighting its advantages over existing solutions, and four relevant real-world applications running on top of VIVO.
We present a simple set of algorithms based on Thompson Sampling for stochastic bandit problems with graph feedback. Thompson Sampling is generally applicable, without the need to construct complicated upper confidence bounds. As we show in this paper, it has excellent performance in problems with graph feedback, even when the graph structure itself is unknown and/or changing. We provide theoretical guarantees on the Bayesian regret of the algorithm, as well as extensive experi- mental results on real and simulated networks. More specifically, we tested our algorithms on power law, planted partitions and Erdo's–Rényi graphs, as well as on graphs derived from Facebook and Flixster data and show that they clearly outperform related methods that employ upper confidence bounds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.