Abstract-Robot table tennis is a challenging domain in both robotics, artificial intelligence and machine learning. In terms of robotics, it requires fast and reliable perception and control; in terms of artificial intelligence, it requires fast decision making to determine the best motion to hit the ball; in terms of machine learning, it requires the ability to accurately estimate where and when the ball will be so that it can be hit. The use of sophisticated perception (relying, for example, in multicamera vision systems) and state-of-the-art robot manipulators significantly alleviates concerns with perception and control, leaving room for the exploration of novel approaches that focus on estimating where, when and how to hit the ball. In this paper, we move away from the hardware setup commonly used in this domain-typically relying on robotic manipulators combined with an array of multiple fixed cameras-and give the first steps towards having autonomous aerial table tennis robotic players. Specifically, we focus on the task of hitting a ping pong ball thrown at a commercial drone, equipped with a light cardboard racket and an onboard camera. We adopt a general framework for learning complex robot tasks and show that, in spite of the perceptual and actuation limitations of our system, the overall approach enables the quadrotor system to successfully respond to balls served by a human user.