“…Preference learning. Much recent work has learned preferences from different sources of data, such as demonstrations (Ziebart et al, 2010;Ramachandran and Amir, 2007;Ho and Ermon, 2016;Fu et al, 2017;Finn et al, 2016), comparisons (Christiano et al, 2017Sadigh et al, 2017;Wirth et al, 2017), ratings (Daniel et al, 2014), human reinforcement signals (Knox and Stone, 2009;Warnell et al, 2017;MacGlashan et al, 2017), proxy rewards (Hadfield-Menell et al, 2017), etc. We suggest preference learning with a new source of data: the state of the environment when the robot is first deployed.…”