“…Additional work on safety in MDPs has focused on obtaining high-confidence bounds on the performance of a policy before that policy is deployed (Thomas, Theocharous, and Ghavamzadeh 2015b;Hanna, Stone, and Niekum 2017), as well as methods for high-confidence policy improvement (Thomas, Theocharous, and Ghavamzadeh 2015a). Our work draws inspiration from these previous approaches; however, we provide bounds on policy performance that are applicable when learning from demonstrations, i.e., when the rewards are not observed.…”