This article presents a comprehensive approach to integrate formation tracking control and optimal control for a fleet of multiple surface vehicles (SVs), accounting for both kinematic and dynamic models of each SV agent. The proposed control framework comprises two core components: a high‐level displacement‐based formation controller and a low‐level reinforcement learning (RL)‐based optimal control strategy for individual SV agents. The high‐level formation control law, employing a modified gradient method, is introduced to guide the SVs in achieving desired formations. Meanwhile, the low‐level control structure, featuring time‐varying references, incorporates the RL algorithm by transforming the time‐varying closed agent system into an equivalent autonomous system. The application of Lyapunov's direct approach, along with the existence of the Bellman function, guarantees the stability and optimality of the proposed design. Through extensive numerical simulations, encompassing various comparisons and scenarios, this study demonstrates the efficacy of the novel formation control strategy for multiple SV agent systems, showcasing its potential for real‐world applications.