Modern active distribution networks (ADNs) witness increasing complexities that require efforts in control practices, including optimal reactive power dispatch (ORPD). Deep reinforcement learning (DRL) is proposed to manage the network’s reactive power by coordinating different resources, including distributed energy resources, to enhance performance. However, there is a lack of studies examining DRL elements’ performance sensitivity. To this end, in this paper we examine the impact of various DRL reward representations and hyperparameters on the agent’s learning performance when solving the ORPD problem for ADNs. We assess the agent’s performance regarding accuracy and training time metrics, as well as critic estimate measures. Furthermore, different environmental changes are examined to study the DRL model’s scalability by including other resources. Results show that compared to other representations, the complementary reward function exhibits improved performance in terms of power loss minimization and convergence time by 10–15% and 14–18%, respectively. Also, adequate agent performance is observed to be neighboring the best-suited value of each hyperparameter for the studied problem. In addition, scalability analysis depicts that increasing the number of possible action combinations in the action space by approximately nine times results in 1.7 times increase in the training time.