The processing of basketball videos with complex contents faces several challenges in terms of global motion features, group motion features, and individual pose features. The current research cannot solve problems, such as the diverse spatiotemporal features of actions, the utilization of correspondence between spatiotemporal features, the increase of data volume, and the complexity of the network. To solve these problems, this paper studies the visual image recognition of basketball turning and dribbling based on feature extraction. Specifically, the optical flow image was introduced to establish the relationship between the velocity field of the basketball turning and dribbling and the grayscale of the image frame, such as to effectively depict the time variation of pixels. In addition, a convolutional neural network was established based on multi-feature learning to process the sports video image frames, and to extract more spatiotemporal features of basketball turning and dribbling. To improve the feature utilization of the action recognition model, this paper strengthens the extraction of dynamic and static features for the recognition of the player's basketball turning and dribbling in the same scene, and improves the existing convolutional neural network. Furthermore, the multi-feature learning of motion excitation and temporal aggregation of actions were completed. The proposed model was proved effective through experiments.