Contents 1 5 Monotonicity of Value Function for POMDPs 5.1 Model and Assumptions 5.2 Main Result: Monotone Value Function 5.3 Example 1: Monotone Policies for 2-state POMDPs 5.4 Example 2: POMDP Multi-armed Bandits Structural Results 5.5 Complements and Sources 6 Structural Results for Stopping Time POMDPs 6.1 Contents Appendix to Chapter 8 8.A POMDP Numerical Examples References Partially Observed Markov Decision Process (POMDP) Noisy measurements Controlled transitions Controlled transitions and observationsFigure 1.1 Terminology of HMMs, MDPs and POMDPs N k=0 c(x k , u k )} where E denotes mathematical expectation.The optimal choice of actions is determined by a policy (strategy) aswhere the optimal policy µ * k satisfies Bellman's stochastic dynamic program-Suppose a decision maker records measurements of a finite-state Markov chain corrupted by noise. The goal is to decide when the Markov chain hits a specific target state. The decision maker can choose from a finite set of sampling intervals to pick the next time to look at the Markov chain. The aim is to optimize an objective comprising of false alarm, delay cost and cumulative measurement sampling cost. Taking more frequent measurements yields accurate estimates but incurs a higher measurement cost. Making an erroneous decision too soon incurs a false alarm penalty. Waiting too long to declare the target state incurs a delay penalty. What is the optimal sequential strategy for the decision maker?