We propose a general framework, dubbed Stochastic Processing under Imperfect Information (SPII), to study the impact of information constraints and memories on dynamic resource allocation. The framework involves a Stochastic Processing Network (SPN) scheduling problem in which the decision maker may access the system state only through a noisy channel, and resource allocation decisions must be carried out through the interaction between an encoding policy (who observes the state) and allocation policy (who chooses the allocation). Applications in the management of large-scale data centers and human-in-the-loop service systems are among our chief motivations. We quantify the degree to which information constraints reduce the size of the capacity region in general SPNs, and how such reduction depends on the amount of memories available. Using a novel metric, capacity factor, our main theorem characterizes the reduction in capacity region (under "optimal" policies) for all nondegenerate channels, and across almost all combinations of memory sizes. Notably, the theorem demonstrates, in substantial generality, that (1) the presence of a noisy channel always reduces capacity, (2) more memories for the allocation policy always improve capacity, and (3) more memories for the encoding policy have little to no effect on capacity. Finally, all of our positive (achievability) results are established through constructive, implementable policies. Our proof program involves the development of a host of new techniques by combining ideas from information theory, learning and queueing theory. We create a simple yet powerful generalization of the Max-Weight policy, in which individual Markov chains are selected dynamically, in a manner analogous to how schedules are used in a conventional Max-Weight policy.
We study the effect of imperfect memory on decision making in the context of a stochastic sequential action-reward problem. An agent chooses a sequence of actions which generate discrete rewards at different rates. She is allowed to make new choices at rate β, while past rewards disappear from her memory at rate μ. We focus on a family of decision rules where the agent makes a new choice by randomly selecting an action with a probability approximately proportional to the amount of past rewards associated with each action in her memory. We provide closed-form formulae for the agent's steady-state choice distribution in the regime where the memory span is large (μ -> 0), and show that the agent's success critically depends on how quickly she updates her choices relative to the speed of memory decay. If β >> μ, the agent almost always chooses the best action, i.e., the one with the highest reward rate. Conversely, if β << μ, the agent chooses an action with a probability roughly proportional to its reward rate.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.