Abstract-Three advanced natural interaction modalities for mobile robot guidance in an indoor environment were developed and compared using two tasks and quantitative metrics to measure performance and workload. The first interaction modality is based on direct physical interaction requiring the human user to push the robot in order to displace it. The second and third interaction modalities exploit a 3D vision-based human-skeleton tracking allowing the user to guide the robot by either walking in front of it or by pointing towards a desired location. In the first task, the participants were asked to guide the robot between different rooms in a simulated physical apartment requiring rough movement of the robot through designated areas. The second task evaluated robot guidance in the same environment through a set of waypoints, which required accurate movements. The three interaction modalities were implemented on a generic differential drive mobile platform equipped with a pan-tilt system and a Kinect camera. Task completion time and accuracy were used as metrics to assess the users' performance, while the NASA-TLX questionnaire was used to evaluate the users' workload. A study with 24 participants indicated that choice of interaction modality had significant effect on completion time (F(2,61)=84.874, p<0.001), accuracy (F(2,29)=4.937, p=0.016), and workload (F(2,68)=11.948, p<0.001). The direct physical interaction required less time, provided more accuracy and less workload than the two contactless interaction modalities. Between the two contactless interaction modalities, the person-following interaction modality was systematically better than the pointing-control one: the participants completed the tasks faster with less workload. Index Terms-Human-robot interaction, mobile robots, direct physical interaction, person tracking, person following, gesture recognition