Temporal modeling is the key for action recognition in videos, but traditional 2D CNNs do not capture temporal relationships well. 3D CNNs can achieve good performance, but are computationally intensive and not well practiced on existing devices. Based on these problems, we design a generic and effective module called spatio-temporal motion network (SMNet). SMNet maintains the complexity of 2D and reduces the computational effort of the algorithm while achieving performance comparable to 3D CNNs. SMNet contains a spatio-temporal excitation module (SE) and a motion excitation module (ME). The SE module uses group convolution to fuse temporal information to reduce the number of parameters in the network, and uses spatial attention to extract spatial information. The ME module uses the difference between adjacent frames to extract feature-level motion patterns between adjacent frames, which can effectively encode motion features and help identify actions efficiently. We use ResNet-50 as the backbone network and insert SMNet into the residual blocks to form a simple and effective action network. The experiment results on three datasets, namely Something-Something V1, Something-Something V2, and Kinetics-400, show that it out performs state-of-the-arts motion recognition networks.
Power line inspection is an important part of the smart grid. Efficient real-time detection of power devices on the power line is a challenging problem for power line inspection. In recent years, deep learning methods have achieved remarkable results in image classification and object detection. However, in the power line inspection based on computer vision, datasets have a significant impact on deep learning. The lack of public high-quality power scene data hinders the application of deep learning. To address this problem, we built a dataset for power line inspection scenes, named RSIn-Dataset. RSIn-Dataset contains 4 categories and 1887 images, with abundant backgrounds. Then, we used mainstream object detection methods to build a benchmark, providing reference for insulator detection. In addition, to address the problem of detection inefficiency caused by large model parameters, an improved YoloV4 is proposed, named YoloV4++. It uses a lightweight network, i.e., MobileNetv1, as the backbone, and employs the depthwise separable convolution to replace the standard convolution. Meanwhile, the focal loss is implemented in the loss function to solve the impact of sample imbalance. The experimental results show the effectiveness of YoloV4++. The mAP and FPS can reach 94.24% and 53.82 FPS, respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.