Software-Defined Networking (SDN) enhances network control but faces Distributed Denial of Service (DDoS) attacks due to centralized control and flow-table constraints in network devices. To overcome this limitation, we introduce a multi-path routing algorithm for SDN called Trust-Based Proximal Policy Optimization (TBPPO). TBPPO incorporates a Kullback–Leibler divergence (KL divergence) trust value and a node diversity mechanism as the security assessment criterion, aiming to mitigate issues such as network fluctuations, low robustness, and congestion, with a particular emphasis on countering DDoS attacks. To avoid routing loops, differently from conventional ‘Next Hop’ routing decision methodology, we implemented an enhanced Depth-First Search (DFS) approach involving the pre-computation of path sets, from which we select the best path. To optimize the routing efficiency, we introduced an improved Proximal Policy Optimization (PPO) algorithm based on deep reinforcement learning. This enhanced PPO algorithm focuses on optimizing multi-path routing, considering security, network delay, and variations in multi-path delays. The TBPPO outperforms traditional methods in the Germany-50 evaluation, reducing average delay by 20%, cutting delay variation by 50%, and leading in trust value by 0.5, improving security and routing efficiency in SDN. TBPPO provides a practical and effective solution to enhance SDN security and routing efficiency.