Human activity recognition has a wide range of applications in various fields, such as video surveillance, virtual reality, and human-computer intelligent interaction. It has emerged as a significant research area in computer vision. Key algorithms utilized include P-LSTM (partial sensing LSTM), ST-GCN (Spatiotemporal graph convolutional networks), and 2s-AGCN (two-flow adaptive graph convolutional networks). Despite the remarkable achievements made by these algorithms, there are still some challenges to address, including unsatisfactory recognition accuracy, convergence difficulties, and limited generalization ability. To tackle these issues, this paper proposes HAR-ViT - a human activity recognition method based on Vision Transformer architecture. The enhanced AGCN (eAGCN) map filter is employed to assign weights to human skeleton data, highlighting key nodes and promoting model convergence. The position encoder module captures precise timing information while the transformer encoder efficiently compresses sequence data features to enhance calculation speed. Human activity recognition is accomplished through multi-layer perceptron (MLP) classifiers. Experimental results demonstrate that the proposed method achieves an accuracy of 91.06% for cross-subject and 96.73% for cross-view human behavior recognition on the NTU60 dataset respectively; similarly achieving accuracies of 87.61% and 89.02% on the NTU120 dataset for the same task respectively compared to state-of-the-art algorithms with an improved accuracy of approximately 1%, while reducing total parameter count by 57.24%.