This paper presents advances in Kullback-Leibler-Quadratic (KLQ) optimal control: a stochastic control framework for Markovian models. The motivation is distributed control of large networked systems. The objective function is composed of a control cost in the form of Kullback-Leibler divergence plus a quadratic cost on the sequence of marginal distributions. With this choice of objective function, the optimal probability distribution of a population of agents over a finite time horizon is shown to be an exponential tilting of the nominal probability distribution. The same is true for the controlled transition matrices that induce the optimal probability distribution.However, one limitation of the previous work is that randomness can only be introduced via the control policy; all uncontrolled processes must be modeled as deterministic to render them immutable under an exponential tilting. In this work, only the controlled dynamics are subject to tilting, allowing for more general probabilistic models.Numerical experiments are conducted in the context of power networks. The distributed control techniques described in this paper can transform a large collection of flexible loads into a 'virtual battery' capable of delivering the same grid services as traditional batteries. Additionally, quality of service to the load owner is guaranteed, privacy is preserved, and computation and communication requirements are reduced, relative to alternative centralized control techniques.
I. INTRODUCTIONThe setting of this paper is optimal control of Markov Decision Processes (MDPs). The state space S and input space U are assumed to be finite. A finite time horizon is considered, indexed by {k : 1 ≤ k ≤ K}. The controlled transition matrix T k defines the statistics of the state process S with input process U :The policies {ϕ k } are assumed to be Markovian:As in [1], [2], the Kullback-Leibler-Quadratic (KLQ) optimization criterion is based on convex functions of the marginal probability mass functions (pmfs) of the joint stateinput process X k = (S k , U k ):