Human action recognition (HAR) is an important field of research that intercepts with areas such as image processing, computer vision, and the design of fast algorithms, among others. HAR has several important applications including healthcare monitoring, security and surveillance, assisted living, smart homes, and video search and indexing. Despite recent developments in the field, major challenges remain. For instance, HAR is computationally expensive. Tasks such as video preprocessing, feature extraction, feature quantization, and feature classification require the execution of millions of arithmetic operations for a video sequence lasting a few seconds. To address these problems, we propose a heterogeneous approach that is based on an extensive algorithmic and experimental analysis of the histogram of gradients (HOG3D) application. We divide the application into four stages and evaluate each on CPU, GPU, and FPGA platforms. Our heterogeneous design combines the strengths of both FPGA and GPU platforms, and achieves a 1.3X speedup compared with a state-of-the-art GPU while being 1.5X more energy efficient than other homogeneous solutions, including FPGA-based designs. Moreover, our heterogeneous HAR design using fixed-point arithmetic has comparable accuracy to those of HAR algorithms using single precision floating point arithmetic.