We consider infinite-state turn-based stochastic games of two players, and , who aim at maximizing and minimizing the expected total reward accumulated along a run, respectively. Since the total accumulated reward is unbounded, the determinacy of such games cannot be deduced directly from Martin's determinacy result for Blackwell games. Nevertheless, we show that these games are determined both for unrestricted (i.e., history-dependent and randomized) strategies and deterministic strategies, and the equilibrium value is the same. Further, we show that these games are generally not determined for memoryless strategies. Then, we consider a subclass of -finitely-branching games and show that they are determined for all of the considered strategy types, where the equilibrium value is always the same. We also examine the existence and type of (ε-)optimal strategies for both players.
IntroductionTurn-based stochastic games of two players are a standard model of discrete systems that exhibit both non-deterministic and randomized choice. One player (called or Max in this paper) corresponds to the controller who wishes to achieve/maximize some desirable property of the system, and the other player (called or Min) models the environment which aims at spoiling the property. Randomized choice is used to model events such as system failures, bit-flips, or coin-tossing in randomized algorithms.Technically, a turn-based stochastic game (SG) is defined as a directed graph where every vertex is either stochastic or belongs to one of the two players. Further, there is a fixed probability distribution over the outgoing transitions of every stochastic vertex. A play of the game is initiated by putting a token on some vertex. Then, the token is moved from vertex to vertex by the players or randomly. A strategy specifies how a player should play. In general, a strategy may depend on the sequence of vertices visited so far (we say that the strategy is history-dependent (H)), and it may specify a probability distribution over the outgoing transitions of the currently visited vertex rather than a single outgoing transtion (we say that the strategy is randomized (R)). Strategies that do not depend on the history of a play are called memoryless (M), and strategies that do not randomize (i.e., select a single outgoing transition) are called determinisctic (D). Thus, we obtain the MD, MR, HD, and HR strategy classes, where HR are unrestricted strategies and MD are the most restricted memoryless deterministic strategies.