Consider a stochastic system with a finite state space and a finite action space. Between actions, the waiting time to transition is a random variable with a continuous distribution function depending only on the current state and the action taken. There are positive costs of taking actions and the system earns at a rate depending upon the state of the system and the action taken. We allow actions to be taken between transitions. A policy for which there is a positive probability of an action between transitions involves "hesitation." A form of the long range average income is the criterion for comparing different policies. It is shown that there exists a nonrandomized stationary policy that is optimal in the class of all policies for which the actions taken form a sequence. "Hesitation" can be eliminated if the waiting time distributions are exponential. Howard's policy improvement method can be used to obtain an optimal policy.
We consider a denumerable state Markovian sequential control process. It is well known that when we consider the expected total discounted income as a criterion, there exists a nonrandomized stationary policy that is optimal. It is also well known that when we consider the expected average income as a criterion, an optimal nonrandomized stationary policy exists when a certain system of equations has a solution. The problem considered here is: if there exist two optimal nonrandomized stationary policies, will a randomization of these two policies be optimal? It is shown that in the discounted case the answer is always yes, but in the average income case, the answer is yes only under certain additional conditions.
We consider the costly surveillance of a stochastic system with a finite state space and a finite number of actions in each state. There is a positive cost of observing the system and the system earns at a rate depending on the state of the system and the action taken. A policy for controlling such a system specifies the action to be taken and the time to the next observation, both possibly random and depending on the past history of the system. A form of the long range average income is the criterion for comparing different policies. If R A denotes the class of policies for which the times between successive observations of the system are random variables with cumulative distribution functions on [0, A], A < m, we show that there exists a nonrandomized stationary policy that is optimal in RA. Furthermore, for sufficiently large A, this optimal policy is independent of A.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.