This study proposes a novel hands-free interaction method using multimodal gestures such as eye gazing and head gestures and deep learning for human-robot interaction (HRI) in mixed reality (MR) environments. Since human operators hold some objects for conducting tasks, there are many constrained situations where they cannot use their hands for HRI interactions. To provide more effective and intuitive task assistance, the proposed hands-free method supports coarse-to-fine interactions. Eye gazing-based interaction is used for coarse interactions such as searching and previewing of target objects, and head gesture interactions are used for fine interactions such as selection and 3D manipulation. In addition, deep learning-based object detection is applied to estimate the initial positioning of physical objects to be manipulated by the robot. The result of object detection is then combined with 3D spatial mapping in the MR environment for supporting accurate initial object positioning. Furthermore, virtual object-based indirect manipulation is proposed to support more intuitive and efficient control of the robot, compared with traditional direct manipulation (e.g., joint-based and end effector-based manipulations). In particular, a digital twin, the synchronized virtual robot of the real robot, is used to provide a preview and simulation of the real robot to manipulate it more effectively and accurately. Two case studies were conducted to confirm the originality and advantages of the proposed hands-free HRI: (1) performance evaluation of initial object positioning and (2) comparative analysis with traditional direct robot manipulations. The deep learningbased initial positioning reduces much effort for robot manipulation using eye gazing and head gestures. The object-based indirect manipulation also supports more effective HRI than previous direct interaction methods. INDEX TERMS Deep learning, eye gazing, hands-free interaction, head gestures, human-robot interaction, mixed reality, object detection.