“…Roughly speaking, in a Markov perfect equilibrium, each agent's policy is a best reply (i.e., maximizes her own total (discounted) reward) to all other agents' joint policy. Mainstreams of research include analyzing the hardness to compute the equilibria (e.g., Daskalakis [17], Daskalakis et al [19], Garg et al [33]), approximating and analyzing the equilibria (e.g., Adsul et al [1], Boodaghians et al [10], Brânzei et al [11]), designing algorithms to find the equilibria with the knowledge of the transitions and rewards (e.g., Hansen et al [35], Hu and Wellman [39]) or without such knowledge (e.g. Arslan and Yüksel [3]).…”