Recently, video-based micro-gesture recognition with the data captured by holoscopic 3D (H3D) sensors is getting more and more attention, mainly because of their particular advantages to use a single aperture camera to embed the 3D information in 2D images. However, it is not easy to use the embedded 3D information in an efficient manner due to the special imaging principles of H3D sensors. In this paper, an efficient Pseudo View Points (PVP) based method is proposed to introduce the embedded 3D information in H3D images into a new micro-gesture recognition framework. Specifically, we obtain several pseudo view points based frames by composing all the pixels at the same position in each elemental image(EI) in the original H3D frames. This is a very efficient and robust step, and could mimic the real view points so as to represent the 3D information in the frames. Then, a new recognition framework based on 3D DenseNet and Bi-GRU networks is proposed to learn the dynamic patterns of different micro-gestures based on the representation of the pseudo view points. Finally, we perform a thorough comparison on the related benchmark, which demonstrates the effectiveness of our method and also reports a new state of the art performance.
CCS CONCEPTS• Computing methodologies → Activity recognition and understanding; 3D imaging; Supervised learning by classification.