This paper presents an implementation of RoSA, a Robot System Assistant, for safe and intuitive human-machine interaction. The interaction modalities were chosen and previously reviewed using a Wizard of Oz study emphasizing a strong propensity for speech and pointing gestures. Based on these findings, we design and implement a new multi-modal system for contactless human-machine interaction based on speech, facial, and gesture recognition. We evaluate our proposed system in an extensive study with multiple subjects to examine the user experience and interaction efficiency. It reports that our method achieves similar usability scores compared to the entirely human remote-controlled robot interaction in our Wizard of Oz study. Furthermore, our framework’s implementation is based on the Robot Operating System (ROS), allowing modularity and extendability for our multi-device and multi-user method.