A flexible reinforcement learning (RL) optimal collision-avoidance control formulation for unmanned aerial vehicles (UAVs) with discrete-time frameworks is revealed in this work. By utilizing the neural network (NN) estimating capacity and the actor-critic control scheme of the RL technique, an adaptive RL optimal collision-free controller with a minimal learning parameter (MLP) is formulated, which is based on a novel strategic utility function. The optimal collision-avoidance control issue, which couldn’t be addressed in the prior literature, can be resolved by the suggested approaches. Furthermore, the proposed MPL adaptive optimal control formulation allows for a reduction in the quantity of adaptive laws, leading to reduced computational complexity. Additionally, a rigorous stability analysis is provided, demonstrating that the uniform ultimate boundedness (UUB) of all signals in the closed-loop system is ensured by the proposed adaptive RL. Finally, the simulation outcomes illustrate the effectiveness of the proposed optimal RL control approaches.