“…Tasks such as game playing have simple, natural reward functions that enable learning of effective policies (Tesauro, 1995;Mnih et al, 2013;Silver et al, 2017); however, many tasks, such as those involving human interaction, multiple competing objectives, or high-dimensional state spaces, do not have easy to define reward functions (Knox et al, 2021;Holland et al, 2013;Cabi et al, 2019). In these cases, reward functions can be learned from demonstrations, explicit preferences, or other forms of reward supervision (Ziebart et al, 2008;Christiano et al, 2017;Cabi et al, 2019). However, evaluating the quality of learned reward functions, which may be too complicated to manually inspect, remains an open challenge.…”