Traditional contact and non-contact methods for estimating visual interaction forces and recognizing behavior have significant drawbacks with regards to biocompatibility, sensor size, the fragility of materials, and balancing algorithm accuracy and speed. To address these limitations, the study suggests a lightweight, regularized transformer-based visual interaction behavior recognition method. The method contains three important parts: image input and slice preprocessing, global semantic representation based on deep lightweight vision Transformer, and regularized interaction behavior recognition. At the same time, the new model is able to collect and analyze preschool children's image data through a dynamic window, and then realize the visual interaction process for preschool children through machine interaction. Experiments shows that the new method achieves 97.6% accuracy and 97.5% F1 score for interaction behavior recognition on a large-scale robot interaction dataset, with a single average inference time of only 0.18 seconds. The experiment yields significant results indicating that the LRe Trans-based method for recognizing visual interaction behavior holds advantages for the specific problem of robots interacting with preschoolers. The method not only provides valuable insights into the theoretical basis of this field but also offers potential applications for future research.