Graph convolutional networks (GCN) have attracted increasing interest in action recognition in recent years. GCN models human skeleton sequences as spatio-temporal graphs. Also, attention mechanisms are often jointly used with GCNs to highlight important frames or body joints in a sequence. However, attention modules learn parameters offline and are fixed, so may not adapt well to various action samples. In this paper, we propose a simple but effective motion-driven spatial and temporal adaptation strategy to dynamically strengthen the features of important frames and joints for skeleton-based action recognition. The rationale is that the joints and frames with dramatic motions are generally more informative and discriminative. We decouple and combine the spatial and temporal refinements by using a two-branch structure, in which the joint and frame-wise feature refinements perform in parallel. Such a structure can also lead to learn more complementary feature representations. Moreover, we propose to use the fully connected graph convolution to learn the long-range spatial dependencies. Besides, we investigate two high-resolution skeleton graphs by creating virtual joints, aiming to improve the representation of skeleton features. By combining the above proposals, we develop a novel motion-driven spatial and temporal adaptive high-resolution GCN. Experimental results demonstrate that the proposed model achieves state-of-the-art (SOTA) results on the challenging large-scale Kinetics-Skeleton and UAV-Human datasets, and it is on par with the SOTA methods on the two NTU-RGB+D 60&120 datasets. Additionally, our motiondriven adaptation method shows encouraging performance when compared with the attention mechanisms.
Reidentifying an occluded person across nonoverlapping cameras is still a challenging task. In this work, we propose a novel pose-guided part-based adaptive pyramid neural network for occluded person reidentification. Firstly, to alleviate the impact of occlusion, we utilize pose landmarks to generate pose-guided attention maps. The attention maps will help the model focus on the nonoccluded regions. Secondly, we use pyramid pooling to extract multiscale features in order to address the scale variation problem. The generated pyramid features are then multiplied by attention maps to achieve pose-guided adaptive pyramid features. Thirdly, we propose a pose-guided body part partition scheme to deal with the alignment problem. Accordingly, the adaptive pyramid features are divided into partitions and fed into individual fully connected layers. In the end, all the part-based matching scores are fused with a weighted sum rule for person reidentification. The effectiveness of our method is clearly validated by the experimental results on two popular occluded and holistic datasets, i.e., Occluded-DukeMTMC and the Market-1501.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.