We study a game between liquidity provider (LP) and liquidity taker agents interacting in an over‐the‐counter market, for which the typical example is foreign exchange. We show how a suitable design of parameterized families of reward functions coupled with shared policy learning constitutes an efficient solution to this problem. By playing against each other, our deep‐reinforcement‐learning‐driven agents learn emergent behaviors relative to a wide spectrum of objectives encompassing profit‐and‐loss, optimal execution, and market share. In particular, we find that LPs naturally learn to balance hedging and skewing, where skewing refers to setting their buy and sell prices asymmetrically as a function of their inventory. We further introduce a novel RL‐based calibration algorithm, which we found performed well at imposing constraints on the game equilibrium. On the theoretical side, we are able to show convergence rates for our multi‐agent policy gradient algorithm under a transitivity assumption, closely related to generalized ordinal potential games.