Although various studies on monitoring dog behavior have been conducted, methods that can minimize or compensate data noise are required. This paper proposes multimodal data-based dog behavior recognition that fuses video and sensor data using a camera and a wearable device. The video data represent the moving area of dogs to detect the dogs. The sensor data represent the movement of the dogs and extract features that affect dog behavior recognition. Seven types of behavior recognition were conducted, and the results of the two data types were used to recognize the dog’s behavior through a fusion model based on deep learning. Experimentation determined that, among FasterRCNN, YOLOv3, and YOLOv4, the object detection rate and behavior recognition accuracy were the highest when YOLOv4 was used. In addition, the sensor data showed the best performance when all statistical features were selected. Finally, it was confirmed that the performance of multimodal data-based fusion models was improved over that of single data-based models and that the CNN-LSTM-based model had the best performance. The method presented in this study can be applied for dog treatment or health monitoring, and it is expected to provide a simple way to estimate the amount of activity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.