In actual traffic scenarios, precise recognition of traffic participants, such as vehicles and pedestrians, is crucial for intelligent transportation. This study proposes an improved algorithm built on Mask-RCNN to enhance the ability of autonomous driving systems to recognize traffic participants. The algorithm incorporates long and shortterm memory networks and the fused attention module (GSAM, GCT, and Spatial Attention Module) to enhance the algorithm's capability to process both global and local information. Additionally, to increase the network's initial operation stability, the original network activation function was replaced with Gaussian error linear unit. Experiments were conducted using the publicly available Cityscapes dataset. Comparing the test results, it was observed that the revised algorithm outperformed the original algorithm in terms of AP 50 , AP 75 , and other metrics by 8.7% and 9.6% for target detection and 12.5% and 13.3% for segmentation.