“…We tuned the weights through the Human-inthe-loop experiments to imitate naturalistic behaviors. The competitive two-player game can be formulated as a zero-sum game [26]. We thus formulate the ego robot's reward as the negative reward (i.e., cost) of the opponent:…”