Landslide susceptibility prediction usually involves the comprehensive analysis of terrain and other factors that may be distributed with spatial patterns. Without considering the spatial correlation and mutual influence between pixels, conventional prediction methods often focus only on information from individual pixels. To address this issue, the present study proposes a new strategy for neighboring pixel collaboration based on the Unified Perceptual Parsing Network (UPerNet), the Vision Transformer (ViT), and Vision Graph Neural Networks (ViG). This strategy efficiently utilizes the strengths of deep learning in feature extraction, sequence modeling, and graph data processing. By considering the information from neighboring pixels, this strategy can more accurately identify susceptible areas and reduce misidentification and omissions. The experimental results suggest that the proposed strategy can predict landslide susceptibility zoning more accurately. These predictions can identify flat areas such as rivers and distinguish between areas with high and very high landslide susceptibility. Such refined zoning outcomes are significant for landslide prevention and mitigation and can help decision-makers formulate targeted response measures.