This paper designed a voice interactive robot system that can conveniently execute assigned service tasks in real-life scenarios. It is equipped without a microphone where users can control the robot with spoken commands; the voice commands are then recognized by a well-trained deep neural network model of automatic speech recognition (ASR), which enables the robot to execute and complete the command based on the navigation of a real-time simultaneous localization and mapping (SLAM) algorithm. The voice interaction recognition model is divided into two parts: (1) speaker separation and (2) ASR. The speaker separation is applied by a deep-learning system consisting of eight convolution layers, one LSTM layer, and two fully connected (FC) layers to separate the speaker’s voice. This model recognizes the speaker’s voice as a referrer that separates and holds the required voiceprint and removes noises from other people’s voiceprints. Its automatic speech recognition uses the novel sandwich-type conformer model with a stack of three layers, and combines convolution and self-attention to capture short-term and long-term interactions. Specifically, it contains a multi-head self-attention module to directly convert the voice data into text for command realization. The RGB-D vision-based camera uses a real-time appearance-based mapping algorithm to create the environment map and replace the localization with a visional odometer to allow the robot to navigate itself. Finally, the proposed ASR model was tested to check if the desired results will be obtained. Performance analysis was applied to determine the robot’s environment isolation and voice recognition abilities. The results showed that the practical robot system successfully completed the interactive service tasks in a real environment. This experiment demonstrates the outstanding performance with other ASR methods and voice control mobile robot systems. It also verified that the designed voice interaction recognition system enables the mobile robot to execute tasks in real-time, showing that it is a convenient way to complete the assigned service applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.