The current robotics field, led by a new generation of information technology, is moving into a new stage of human-machine collaborative operation. Unlike traditional robots that need to use isolation rails to maintain a certain safety distance from people, the new generation of human-machine collaboration systems can work side by side with humans without spatial obstruction, giving full play to the expertise of people and machines through an intelligent assignment of operational tasks and improving work patterns to achieve increased efficiency. The robot’s efficient and accurate recognition of human movements has become a key factor in measuring robot performance. Usually, the data for action recognition is video data, and video data is time-series data. Time series describe the response results of a certain system at different times. Therefore, the study of time series can be used to recognize the structural characteristics of the system and reveal its operation law. As a result, this paper proposes a time series-based action recognition model with multimodal information fusion and applies it to a robot to realize friendly human-robot interaction. Multifeatures can characterize data information comprehensively, and in this study, the spatial flow and motion flow features of the dataset are extracted separately, and each feature is input into a bidirectional long and short-term memory network (BiLSTM). A confidence fusion method was used to obtain the final action recognition results. Experiment results on the publicly available datasets NTU-RGB + D and MSR Action 3D show that the method proposed in this paper can improve action recognition accuracy.