In this paper, we propose a novel and efficient framework for 3D action recognition using a deep learning architecture. First, we develop a 3D normalized pose space that consists of only 3D normalized poses, which are generated by discarding translation and orientation information. From these poses, we extract joint features and employ them further in a Deep Neural Network (DNN) in order to learn the action model. The architecture of our DNN consists of two hidden layers with the sigmoid activation function and an output layer with the softmax function. Furthermore, we propose a keyframe extraction methodology through which, from a motion sequence of 3D frames, we efficiently extract the keyframes that contribute substantially to the performance of the action. In this way, we eliminate redundant frames and reduce the length of the motion. More precisely, we ultimately summarize the motion sequence, while preserving the original motion semantics. We only consider the remaining essential informative frames in the process of action recognition, and the proposed pipeline is sufficiently fast and robust as a result. Finally, we evaluate our proposed framework intensively on publicly available benchmark Motion Capture (MoCap) datasets, namely HDM05 and CMU. From our experiments, we reveal that our proposed scheme significantly outperforms other state-of-the-art approaches.2 of 24 inertial sensor-and accelerometer-based systems [7,8]; and hybrid systems [9]. As another domain, 3D motions are reconstructed from different sources, e.g., reconstruction from video or image data [10][11][12][13][14] and reconstruction from accelerometer data [15,16]. In short, motions captured or generated by different sources of means are abundant and contain a lot of hidden knowledge and information that may be exploited further in different types of applications, such as those mentioned above.There exists a variety of action classification methods that are based on different input data; the input may be in the form of simple RGB videos or spatiotemporal joint trajectories acquired by means of a sensor system (e.g., mechanical, magnetic, optic, inertial, non-optic wearable sensors and RGB-D sensors, such as Kinect) or an estimated 3D skeleton from image data or a hybrid system. Although a lot of research has been done in the domain of action recognition, there still exist numerous challenges, such as viewpoint variations, different human body sizes and appearances, and illumination factors that may influence the efficiency and performance of existing algorithms [17]. Moreover, each performing actor has his or her own way and style of executing the same action. The actions may also have many variations in terms of speed and length. In addition to these inter-class variations, extensive intra-class variations make the task more difficult. For example, it is not an easy task to differentiate between jogging and running, walkForward and walkBackward, sitDownChair and sitDownFloor, standUpSitChair and standUpSitFloor, rotateArmBackward and rota...