Human action recognition plays a key role in human-computer interaction in complex environments. However, similar actions will lead to poor feature sequence extraction and result in a reduction in recognition accuracy. This paper proposes a method (Action-Fusion: Multi-label subspace Learning (MLSL)) from depth maps called Depth Sequential Information Entropy Maps (DSIEM) and skeleton data for human action recognition in multiple modal features. The DSIEM describe the spatial information of human motion with information entropy, and describe the temporal information through stitching. DSIEM can reduce the redundancy of depth sequences and effectively capture spatial motion states. MLSL studies the relationship between different modalities and the inherent connection between different labels. The method is evaluated on three public datasets: Microsoft action 3D dataset (MSR Action3D), University of Texas at Dallas-multimodal human action dataset (UTD-MHAD), UTD MHAD-Kinect Version-2 (UTD-MHAD-Kinect V2). Experimental results show that the proposed MLSL model obtains new state-of-the-art results, including achieving the average rate of the MSR Action3D to 93.55%, the average rate of the UTD-MHAD to 88.37% and the average rate of the UTD-MHAD-Kinect V2 to 90.66%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.