Relay-aided Device-to-Device (D2D) communication combining visible light communication (VLC) with radio frequency (RF) is a promising paradigm in the internet of things (IoT). Static relay limits the flexibility and maintaining connectivity of relays in Hybrid VLC/RF IoT systems. By using a drone as a relay station, it is possible to avoid obstacles such as buildings and to communicate in a line-of-sight (LoS) environment, which naturally aligns with the requirement of VLC Systems. To further support the application of VLC in the IoT, subject to the challenges imposed by the constrained coverage, the lack of flexibility, poor reliability, and connectivity, drone relay-aided D2D communication appears on the horizon and can be cost-effectively deployed for the large-scale IoT. This paper proposes a joint resource allocation and drones relay selection scheme, aiming to maximize the D2D system sum rate while ensuring the quality of service (QoS) requirements for cellular users (CUs) and D2D users (DUs). First, we construct a two-phase coalitional game to tackle the resource allocation problem, which exploits the combination of VLC and RF, as well as incorporates a greedy strategy. After that, a distributed cooperative multi-agent reinforcement learning (MARL) algorithm, called WoLF policy hill-climbing (WoLF-PHC), is proposed to address the drones relay selection problem. Moreover, to further reduce the computational complexity, we propose a lightweight neighbor–agent-based WoLF-PHC algorithm, which only utilizes historical information of neighboring DUs. Finally, we provide an in-depth theoretical analysis of the proposed schemes in terms of complexity and signaling overhead. Simulation results illustrate that the proposed schemes can effectively improve the system performance in terms of the sum rate and outage probability with respect to other outstanding algorithms.