Unmanned Aerial Vehicles (UAVs) are becoming increasingly attractive for the ambitious expectations for 5G and beyond networks due to their several benefits. Indeed, UAV-assisted communications introduce a new range of challenges and opportunities regarding the security of these networks. Thus, in this paper we explore the opportunities that UAVs can provide for physical layer security solutions. Particularly, we analize the secrecy performance of a ground wireless communication network assisted by two friendly UAV jammers in the presence of an eavesdropper. To tackle the secrecy performance of this system, we introduce a new area-based metric, the weighted secrecy coverage, that measures the improvement on the secrecy performance of a system over a certain physical area given by the introduction of friendly jamming. Herein, the optimal 3D positioning of the UAVs and the power allocation is addressed in order to maximize the WSC. For that purpose, we provide a Reinforcement Learning-based solution by modeling the positioning problem as a Multi-Armed Bandit problem over three positioning variables for the UAVs: angle, height and orbit radius. Our results show that there is a trade-off between expediency of the positioning of the UAVs to positions of better secrecy outcome and energy expenditure, and that the proposed algorithm efficiently converges into a stable state.