In response to the problems of target loss and insufficient accuracy in existing target perception and action recognition algorithms, this paper proposes a target perception and action recognition algorithm based on saliency and feature extraction. This algorithm uses saliency detection techniques to obtain salient regions in images or videos to focus attention on the target. At the same time, feature extraction techniques are combined to extract key nodes and inter-frame correlations from the target information. The experimental results of the measurement data show that this algorithm is superior to traditional detection methods in detecting target behaviour. In addition, it has successfully solved the problems of motion misalignment and jumping in pedestrian detection. Although the node localisation of the algorithm needs further improvement, it has shown good application prospects in smart cities and intelligent surveillance. Future work will focus on improving the positioning accuracy of key nodes to enhance its adaptability to different environments and scenarios, providing better support for smart city and other applications.