We consider the task of controlling a quadrotor to hover in front of a freely moving user, using input data from an onboard camera. On this specific task we compare two widespread learning paradigms: a mediated approach, which learns a high-level state from the input and then uses it for deriving control signals; and an end-to-end approach, which skips high-level state estimation altogether. We show that despite their fundamental difference, both approaches yield equivalent performance on this task. We finally qualitatively analyze the behavior of a quadrotor implementing such approaches.
VIDEOS, DATASETS, AND CODEVideos, data, and code to reproduce our results are available at: https://github.com/idsia-robotics/ proximity-quadrotor-learning. arXiv:1809.08881v2 [cs.RO]