Optimality of Myopic Policy for Restless Multiarmed Bandit with Imperfect Observation

We study a problem of information gathering in a social network with dynamically available sources and time varying quality of information. We formulate this problem as a restless multi-armed bandit (RMAB). In this problem, information quality of a source corresponds to the state of an arm in RMAB. The decision making agent does not know the quality of information from sources a priori. But the agent maintains a belief about the quality of information from each source. This is a problem of RMAB with partially observable states. The objective of the agent is to gather relevant information efficiently from sources by contacting them. We formulate this as a infinite horizon discounted reward problem, where reward depends on quality of information. We study Whittle's index policy which determines the sequence of play of arms that maximizes long term cumulative reward. We illustrate the performance of index policy, myopic policy and compare with uniform random policy through numerical simulation.

show abstract

Optimality of Myopic Policy for Restless Multiarmed Bandit with Imperfect Observation

Wang

2016

2016 IEEE Global Communications Conference (GLOBECOM)

View full text Add to dashboard Cite

Abstract-We consider the scheduling problem concerning N projects. Each project evolves as a multi-state Markov process. At each time instant, one project is scheduled to work, and some reward depending on the state of the chosen project is obtained. The objective is to design a scheduling policy that maximizes the expected accumulated discounted reward over a finite or infinite horizon. The considered problem can be cast into a restless multi-armed bandit (RMAB) problem that is of fundamental importance in decision theory. It is well-known that solving the RMAB problem is PSPACE-hard, with the optimal policy usually intractable due to the exponential computation complexity. A natural alternative is to consider the easily implementable myopic policy that maximizes the immediate reward. In this paper, we perform an analytical study on the considered RMAB problem, and establish a set of closed-form conditions to guarantee the optimality of the myopic policy.

show abstract

Optimality of Myopic Policy for Restless Multiarmed Bandit with Imperfect Observation

Cited by 4 publications

References 24 publications

Myopic Policy for Opportunistic Scheduling: Homogeneous Multistate Channels

Myopic Policy for Opportunistic Scheduling: Homogeneous Multistate Channels

Rested and Restless Bandits With Constrained Arms and Hidden States: Applications in Social Networks and 5G Networks

Optimality of Myopic Policy for Restless Multiarmed Bandit with Imperfect Observation

Contact Info

Product

Resources

About