In this paper, we propose a human action recognition method using HOIRM (histogram of oriented interest region motion) feature fusion and a BOW (bag of words) model based on AP (affinity propagation) clustering. First, a HOIRM feature extraction method based on spatiotemporal interest points ROI is proposed. HOIRM can be regarded as a middle-level feature between local and global features. Then, HOIRM is fused with 3D HOG and 3D HOF local features using a cumulative histogram. The method further improves the robustness of local features to camera view angle and distance variations in complex scenes, which in turn improves the correct rate of action recognition. Finally, a BOW model based on AP clustering is proposed and applied to action classification. It obtains the appropriate visual dictionary capacity and achieves better clustering effect for the joint description of a variety of features. The experimental results demonstrate that by using the fused features with the proposed BOW model, the average recognition rate is 95.75% in the KTH database, and 88.25% in the UCF database, which are both higher than those by using only 3D HOG+3D HOF or HOIRM features. Moreover, the average recognition rate achieved by the proposed method in the two databases is higher than that obtained by other methods.