“…Methods in unknown MDP estimation and inverse reinforcement learning aim to learn an optimal policy while estimating an unknown quantity of the MDP, such as the transition law (Burnetas & Katehakis, 1997), secondary parameters (Budhiraja et al, 2012), and the reward function (Ng & Russell, 2000). The maximum entropy IRL framework has proved successful at learning reward functions from expert demonstrations (Ziebart et al, 2008;Boularias et al, 2011;Kalakrishnan et al, 2013).…”