Smoothing Techniques for Computing Nash Equilibria of Sequential Games

Hoda, Samid; Gilpin, Andrew R.; Peòa, J. M.; Sandholm, Tüomas

doi:10.1287/moor.1100.0452

Cited by 100 publications

(159 citation statements)

References 14 publications

Supporting

Mentioning

154

Contrasting

Unclassified

Order By: Relevance

“…In the case of zero-sum extensive-form games with perfect recall, there are efficient techniques for finding an equilibrium, such as linear programming [Koller et al 1994]. An -equilibrium can be found in even larger games via algorithms such as generalizations of the excessive gap technique [Hoda et al 2010] and counterfactual regret minimization ]. The latter two algorithms scale to games with approximately 10 12 game tree states, while the most scalable current general-purpose linear programming technique (CPLEX's barrier method) scales to games with around 10 7 or 10 8 states.…”

Section: Nash Equilibriamentioning

confidence: 99%

Safe opponent exploitation

Ganzfried

Sandholm

2012

Proceedings of the 13th ACM Conference on Electronic Commerce

Self Cite

View full text Add to dashboard Cite

We consider the problem of playing a finitely-repeated two-player zero-sum game safely-that is, guaranteeing at least the value of the game per period in expectation regardless of the strategy used by the opponent. Playing a stage-game equilibrium strategy at each time step clearly guarantees safety, and prior work has conjectured that it is impossible to simultaneously deviate from a stage-game equilibrium (in hope of exploiting a suboptimal opponent) and to guarantee safety. We show that such profitable deviations are indeed possible-specifically, in games where certain types of 'gift' strategies exist, which we define formally. We show that the set of strategies constituting such gifts can be strictly larger than the set of iteratively weakly-dominated strategies; this disproves another recent conjecture which states that all non-iterativelyweakly-dominated strategies are best responses to each equilibrium strategy of the other player. We present a full characterization of safe strategies, and develop efficient algorithms for exploiting suboptimal opponents while guaranteeing safety. We also provide analogous results for sequential perfect and imperfectinformation games, and present safe exploitation algorithms and full characterizations of safe strategies for those settings as well. We present experimental results in Kuhn poker, a canonical test problem for game-theoretic algorithms. Our experiments show that 1) aggressive safe exploitation strategies significantly outperform adjusting the exploitation within equilibrium strategies and 2) all the safe exploitation strategies significantly outperform a (non-safe) best response strategy against strong dynamic opponents.

show abstract

Section: Nash Equilibriamentioning

confidence: 99%

Safe opponent exploitation

Ganzfried

Sandholm

2012

Proceedings of the 13th ACM Conference on Electronic Commerce

Self Cite

View full text Add to dashboard Cite

show abstract

“…(Nesterov 2005a(Nesterov , 2005b If the prox-function's conjugate and the conjugate's gradient are computable quickly (and the prox-function is continuous, strongly convex, and differentiable), we say that the prox-function is nice (Hoda et al 2010). With nice prox-functions the overall algorithm is fast.…”

Section: Winter 2010 23mentioning

confidence: 99%

The State of Solving Large Incomplete‐Information Games, and Application to Poker

Sandholm

2010

AI Magazine

Self Cite

View full text Add to dashboard Cite

G ame-theoretic solution concepts prescribe how rational parties should act in multiagent settings. This is nontrivial because an agent's utility-maximizing strategy generally depends on the other agents' strategies. The most famous solution concept for this is a Nash equilibrium: a strategy profile (one strategy for each agent) where no agent has incentive to deviate from her strategy given that others do not deviate from theirs.In this article I will focus on incomplete-information games, that is, games where the agents do not entirely know the state of the game at all times. The usual way to model them is a game tree where the nodes (that is, states) are further grouped into information sets. In an information set, the player whose turn it is to move cannot distinguish between the states in the information set, but knows that the actual state is one of them. Incomplete-information games encompass most games of practical importance, including most negotiations, auctions, and many applications in information security and physical battle.Such games are strategically challenging. A player has to reason about what others' actions signal about their knowledge. Conversely, the player has to be careful about not signaling too much about her own knowledge to others through her actions. Such games cannot be solved using methods for complete-information games like checkers, chess, or Go. Instead, I will review new game-independent algorithms for solving them.Poker has emerged as a standard benchmark in this space (Shi and Littman 2002; Billings et al. 2002) for a number of reasons, because (1) it exhibits the richness of reasoning about a probabilistic future, how to interpret others' actions as signals, and information hiding through careful action selection, (2) the game is unambiguously specified, (3) the game can be scaled to the desired complexity, (4) humans of a broad range of skill exist for comparison, (5) the game is fun, and (6) computers find interesting strategies automatically. For example, time-tested behaviors such as bluffing and slow play arise from the game-theoretic algorithms automatically rather than having to be explicitly programmed.

show abstract

“…This has motivated the design of iterative algorithms that converge to a Nash equilibrium in the limit. Such algorithms are mainly categorized as first-order meth-ods (FOMs) [Hoda et al 2010] and regret-based [Zinkevich et al 2007] approaches. The current state-of-the-art for practical game solving is a regret-based stochastic algorithm [Lanctot et al 2009], with an O( 1 2 ) convergence rate.…”

Section: Introductionmentioning

confidence: 99%

“…The current state-of-the-art for practical game solving is a regret-based stochastic algorithm [Lanctot et al 2009], with an O( 1 2 ) convergence rate. Hoda et al [2010] have studied first-order methods (FOMs) with an O( 1 ) rate of convergence. While such approaches have more desireable theoretical guarantees, they have yet to become the norm in practice.…”

Section: Introductionmentioning

confidence: 99%

“…On the theoretical side, we investigate a class of distance-generating functions, namely the dilated entropy function over the class of treeplexes, a convex polytope that generalizes the strategy spaces of the players in perfect-recall EFGs. Hoda et al [2010] developed a generic scheme for constructing such functions for EFGs based on standard d.g.f.s used for the simplex domain. However, the generic scheme from Hoda et al [2010] leads to very weak strong convexity parameters, resulting in slow convergence rates.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Faster First-Order Methods for Extensive-Form Game Solving

Kroer

Waugh

Kılınç-Karzan

et al. 2015

Proceedings of the Sixteenth ACM Conference on Economics and Computation

Self Cite

View full text Add to dashboard Cite

We study the problem of computing a Nash equilibrium in large-scale two-player zero-sum extensive-form games. While this problem can be solved in polynomial time, first-order or regret-based methods are usually preferred for large games. Regret-based methods have largely been favored in practice, in spite of their theoretically inferior convergence rates. In this paper we investigate the acceleration of first-order methods both theoretically and experimentally. An important component of many first-order methods is a distancegenerating function. Motivated by this, we investigate a specific distance-generating function, namely the dilated entropy function, over treeplexes, which are convex polytopes that encompass the strategy spaces of perfect-recall extensive-form games. We develop significantly stronger bounds on the associated strong convexity parameter. In terms of extensive-form game solving, this improves the convergence rate of several first-order methods by a factor of O() where M is the maximum value of the 1 norm over the treeplex encoding the strategy spaces.Experimentally, we investigate the performance of three first-order methods (the excessive gap technique, mirror prox, and stochastic mirror prox) and compare their performance to the regret-based algorithms. In order to instantiate stochastic mirror prox, we develop a class of gradient sampling schemes for game trees. Equipped with our distance-generating function and sampling scheme, we find that mirror prox and the excessive gap technique outperform the prior regret-based methods for finding medium accuracy solutions.

show abstract

Smoothing Techniques for Computing Nash Equilibria of Sequential Games

Cited by 100 publications

References 14 publications

Safe opponent exploitation

Safe opponent exploitation

The State of Solving Large Incomplete‐Information Games, and Application to Poker

Faster First-Order Methods for Extensive-Form Game Solving

Contact Info

Product

Resources

About