In light of advancing socio-economic development and urban infrastructure, urban traffic congestion and accidents have become pressing issues. High-resolution remote sensing images are crucial for supporting urban geographic information systems (GIS), road planning, and vehicle navigation. Additionally, the emergence of robotics presents new possibilities for traffic management and road safety. This study introduces an innovative approach that combines attention mechanisms and robotic multimodal information fusion for retrieving traffic scenes from remote sensing images. Attention mechanisms focus on specific road and traffic features, reducing computation and enhancing detail capture. Graph neural algorithms improve scene retrieval accuracy. To achieve efficient traffic scene retrieval, a robot equipped with advanced sensing technology autonomously navigates urban environments, capturing high-accuracy, wide-coverage images. This facilitates comprehensive traffic databases and real-time traffic information retrieval for precise traffic management. Extensive experiments on large-scale remote sensing datasets demonstrate the feasibility and effectiveness of this approach. The integration of attention mechanisms, graph neural algorithms, and robotic multimodal information fusion enhances traffic scene retrieval, promising improved information extraction accuracy for more effective traffic management, road safety, and intelligent transportation systems. In conclusion, this interdisciplinary approach, combining attention mechanisms, graph neural algorithms, and robotic technology, represents significant progress in traffic scene retrieval from remote sensing images, with potential applications in traffic management, road safety, and urban planning.