With the exponential improvement of the integration capabilities of electronic hardware devices, digital images have become an indispensable information carrier, thus promoting the development of image recognition, detection, or tracking technologies. At present, there is a blank in the field of image recognition measured by the action standard of yoga standing three-dimensional movement, which will bring huge room for researchers to play. Based on the traditional image recognition technology, there are cases where missed detection or false detection cannot be correctly identified and made a correct judgment. In this paper, the ability to capture the action is improved by optimizing the network for the positioning of the stereoscopic action target of the yoga station. The linear discriminant analysis (LDA) after adding optimization is used to reduce the dimensionality of the captured image data, which is beneficial to improving image recognition rate and reducing image loss rate. Through the analysis of the three-dimensional movement of the yoga station, the model based on the improved algorithm is compared with the traditional model, and the final image recognition accuracy is improved compared with the network model before the improvement. The image recognition error rate steadily tends to 5%, and the loss rate is also below 2%. Through the optimized convolutional neural network, this paper can accurately capture the image position, and the recognition rate has also been greatly improved, which can provide a reference for future research in other fields.