A heuristic dynamic programming (HDP) algorithm for trajectory tracking and formation control of multi-agent systems (MAS) is presented in this paper. The selected HDP method allows for an online optimal control design. The multi-agent control problem is formulated as a leader-follower, where it is necessary for followers to maintain the assigned formation, while the leader follows the specified trajectory. Developed QR-Solver and RLS µ -QR-HDP-DLQR algorithms provide a new methodology to obtain the solution of the Hamilton-Jacobi-Bellman (HJB) equation. These proposed solutions are based on QR factorization to decrease the computational cost and avoid issues associated with numerical stability of conventional least squares (LS) method. The algorithms' performances such as convergence, numerical stability, and computational metrics, were experimentally evaluated for two control systems: aerial altitude control (one degree of freedom) and terrestrial robot (two degrees of freedom).