Although current gymnastics action detection algorithm has good detection and recognition results but cannot effectively identify a variety of consecutive gymnastic actions and many gymnastics has high false rate. So on this paper we improve the CRF model and bag-of-visual-words semantic model, combine the advantages of both models to build a hierarchical model for behavior recognition, first we create a hierarchical semantic mark CRFs model, the model is divided into upper and lower layers and a gymnastic image filter that based on bag-of-visual-words semantic model. Identifying the error action image by the semantic, not only in line with the cognitive process of machine vision, and can effectively compensate the existing algorithm correcting the high false positives rate. Experiments show that by combining the two algorithms we can detect errors gymnastics image effectively, recognition rate compared with other algorithms improved, and the false detection rate reduced.