Submarine cable is a crucial infrastructure for international communications, and its cost and survivability are two key factors that must be considered at its design phase. In this paper, we propose a machine-learning-assisted submarine cable route planning algorithm for minimizing its accumulated cost and risk. The cost and risk distribution and the direction of the submarine cable route’s starting point and endpoint are used as prior data to initialize the state-action of reinforcement learning (RL). We also propose a multi-agent cross reinforcement learning (MA-XRL) framework composed of Q-learning and SARSA to improve the global optimization capability of RL in the case of multi-objective optimization. The results show that, compared to ant colony optimization (ACO), MA-XRL can reduce the accumulated cost by 26.87% under the same accumulated risk. The maximum accumulated cost of the Pareto solutions obtained by MA-XRL is lower than the minimum accumulated cost of that obtained by ACO. Meanwhile, the running time of MA-XRL is only 1.3‰ of that of ACO. Without prior data of cost and risk initialization, the accumulated cost and risk of the best submarine cable route obtained by MA-XRL is 1.84 times and 7.08 times those with cost and risk distribution initialization, respectively. The direction initialization can accelerate the agent to find the endpoint of the submarine cable route and double the search stability of MA-XRL. Compared to using Q-learning or SARSA alone, MA-XRL can respectively reduce the accumulated risk by 71.81% and 39.51% under the same accumulated cost and can reduce the accumulated cost by 16.65% and 11.99% under the same accumulated risk, respectively.