“…Apart from such vulnerabilities, Q-learning dynamics can also lead to peculiar outcomes in the game settings by leading to tacit collusion that can undermine the competitive nature of the markets [Calvano et al, 2020, Klein, 2021, Hansen et al, 2021, Banchio and Mantegazza, 2022. For example, Banchio and Mantegazza [2022] study the collusive behavior of Q-learners in the widelystudied prisoner's dilemma game (where agents have two actions: 'cooperate' and 'deflect'). They observe that Q-learners can learn to collude in cooperation even though 'cooperate' is always an irrational choice as 'deflect' dominates 'cooperate' strategy.…”