Skeleton-based action recognition is attracting more and more attention owing to the general representation ability of skeleton data. The Graph Convolutional Networks (GCNs) methods extended from Convolutional Neural Networks (CNNs) are proposed to directly extract spatial-temporal information from the graphs. Previous GCNs usually aggregate the skeleton information locally in the vertex domain. However, the focus on the local information brought about the limited representation ability in some actions containing overall dynamics in both spatial and temporal, which pulled down the overall accuracy of the model. Therefore, this paper proposes a more comprehensive two-stream GCN architecture containing the vertex-domain graph convolution and the spectral graph convolution based on Graph Fourier Transform (GFT). One stream utilizes an efficient vertex-domain graph convolution to obtain effective spatial-temporal information via Graph Shift Blocks (GSB), while the other brings the global spectral information from our improved Residual Spectral Blocks (RSB). According to the analysis of the experimental results, the action misalignment for certain actions is reduced. Moreover, along with other GCN methods that only focus on spatial-temporal information, our RSB strategies help improve their performance. DD-GCN is evaluated on three large skeleton-based datasets, NTU-RGBD 60, NTU-RGBD 120, and Kinetics-Skeleton. The experiment results demonstrate a comparable ability to the state-of-the-art.
With the progress of face manipulation techniques, synthesized faces are spreading on the Internet, which raises concerns about potential threats. To prevent these techniques’ abuse, various detection algorithms have been proposed. In this paper, we consider the image’s frequency information, then propose an adaptive filtering algorithm named spatial and adaptive filtering (SAF) Network. SAF is a dual-stream network that considers spatial and frequency domains. In the frequency domain, wavelet transform is used to divide the image into different frequency bands, then an adaptive filter is introduced, which aims to capture more decisive information by giving different weights to different frequencies. To fuse spatial and frequency features, spatial pyramid pooling fusion (SPPF) is proposed, which solves the mismatch of feature maps, and considers the relationship between different patches by attention mechanism. Experiment results show that the performance of SAF is better than the comparison algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.