In order to alleviate bottlenecks such as the lack of professional teachers, inattention during training processes,and low effectiveness in concentration training, we have proposed an immersive human–robot interactive (HRI) game framework based on deep learning for children’s concentration training and demonstrated its use through human–robot interactive games based on gesture recognition. The HRI game framework includes four functional modules: video data acquisition, image recognition modeling, a deep learning algorithm (YOLOv5), and information feedback. First, we built a gesture recognition model containing 10,000 pictures of children’s gestures, using the YOLOv5 algorithm. The average accuracy in recognition trainingwas 98.7%. Second, we recruited 120 children with attention deficits (aged from 9 to 12 years) to play the HRI games, including 60 girls and 60 boys. In the HRI game experiment, we obtained 8640 sample data, which were normalized and processed.According to the results, we found that the girls had better visual short-term memory and a shorter response time than boys. The research results showed that HRI games had a high efficacy, convenience, and full freedom, making them appropriate for children’s concentration training.