This study develops a comprehensive robotic system, termed the robot cognitive system, for complex environments, integrating three models: the engagement model, the intention model, and the human–robot interaction (HRI) model. The system aims to enhance the naturalness and comfort of HRI by enabling robots to detect human behaviors, intentions, and emotions accurately. A novel dual-arm-hand mobile robot, Mobi, was designed to demonstrate the system’s efficacy. The engagement model utilizes eye gaze, head pose, and action recognition to determine the suitable moment for interaction initiation, addressing potential eye contact anxiety. The intention model employs sentiment analysis and emotion classification to infer the interactor’s intentions. The HRI model, integrated with Google Dialogflow, facilitates appropriate robot responses based on user feedback. The system’s performance was validated in a retail environment scenario, demonstrating its potential to improve the user experience in HRIs.