“…Unfortunately, these algorithms involve searching the whole strategy space, so their convergence time is exponential. Another algorithm that uses stages to provide a stable learning environment is the ESRL algorithm for coordinated exploration (Verbeeck, Nowé, Parent, & Tuyls, 2007). Marden, Arslan, and Shamma (2007b) and Marden, Young, Arslan, and Shamma (2009) use an algorithm with experimentation and best replies but without explicit stages that converges for weakly acyclic games, where best-reply dynamics converge when agents move one at a time, rather than moving all at once, as we assume here.…”