Introduction and objective: the purpose of this work is to design and implement an innovative tool to recognize 16 different human gestural actions and use them to predict 7 different emotional states. The solution proposed in this paper is based on RGB and depth information of 2D/3D images acquired from a commercial RGB-D sensor called Kinect. Materials: the dataset is a collection of several human actions made by different actors. Each action is performed by each actor for three times in each video. 20 actors perform 16 different actions, both seated and upright, totalling 40 videos per actor. Methods: human gestural actions are recognized by means feature extractions as angles and distances related to joints of human skeleton from RGB and depth images. Emotions are selected according to the state-of-the-art.Experimental results: despite truly similar actions, the overallaccuracy reached is approximately 80%. Conclusions and future works: the proposed work seems to be back-ground-and speedindependent, and it will be used in the future as part of a multimodal emotion recognition software based on facial expressions and speech analysis as well.