“…A reward-sharing agent i learns a individual orientation function, w i ηi : O i × A −i → R N , parameterized by η i , that maps its own observation o i and all other agents' actions a −i to a vector of reward-sharing ratios for all N agents. Unlike the existing methods based on SVO (Peysakhovich and Lerer, 2018b;Baker, 2020;Gemp et al, 2022;Yi et al, 2021) or saction (Koster et al, 2020;Lupu and Precup, 2020;Yang et al, 2020;Vinitsky et al, 2021;Dong et al, 2021) mechanism, the orientation function in RESVO (1) allows agents to reward itself 3 , and (2) the sum of all sharing ratios does not need to be equal to 1. This is one of the reasons why reward sharing, a mechanism used by existing work, can encourage the division of labor and solve ISD.…”