In this chapter we describe several recent results on the problem of coordination among agents when they have partial information about a state which affects their utility, payoff, or reward function. The state is not controlled and rather evolves according to an independent and identically distributed (i.i.d.) random process. This random process might represent various phenomena. In control, it may represent a perturbation or model uncertainty. In the context of smart grids, it may represent a forecasting noise [1]. In wireless communications, it may represent the state of the global communication channel. The approach used is to exploit Shannon theory to characterize the achievable long-term utility region. Two scenarios are described. In the first scenario, the number of agents is arbitrary and the agents have causal knowledge about the state. In the second scenario, there are only two agents and the agents have some knowledge about the future of the state, making its knowledge non-causal.
Chapter overviewThis chapter concerns the problem of coordination among agents. Technically, the problem is as follows. We consider a set of K ≥ 2 agents. Agent k has a utility, payoff, or reward function u k (x 0 , x 1 , ..., x K ) where x k , k ≥ 1, is the action of Agent k while x 0 is the action of an agent called Nature. The Nature's actions correspond to the system state and is assumed to be non-controlled; more precisely, Nature corresponds to an independent and identically distributed (i.i.d.) random process. The problem studied in this chapter is to characterize the long-term utility region where σ k = (σ k,t ) t≥1 is a sequence of functions which represent the strategy of Agent k, x k (t) is the action chosen by Agent k at time or stage t ≥ 1, t being the time or stage index; concerning notations, as far as random variables are concerned, capital letters will stand for random variables whereas, small letters will stand for realizations. Note that, implicitly, we assume sufficient conditions (such as utility boundedness) under which the above limit exists. The functions σ k,t , k ∈ {1, ..., K}, map the available knowledge to the action of the considered agent. The available knowledge depends on the information assumptions made (e.g., the knowledge of the state can be causal or non-causal). We will distinguish between two scenarios. In the first scenario, agents are assumed to have some causal knowledge (in the wide sense) about the state whereas, in the second scenario non-causal knowledge (i.e., some knowledge about the future) about the state is assumed. The second scenario is definitely the most difficult one technically, which is why only two agents will be assumed. Remarkably, the long-term utility region, whenever available, can be characterized in terms of elegant information constraints. For instance, in the scenario of non-causal state information, determining the long-term utility region amounts to solving a convex optimization problem whose non-trivial constraints are the derived information-the...