A coordinated beamforming design for heterogeneous wireless networks combining radio frequency (RF) and visible light communication (VLC) access points (AP), with several users receiving information simultaneously from both APs, was investigated. Our goal was to maximize the energy efficiency (EE) of the entire system under both the perfect and imperfect channel state information (CSI) conditions. For such optimization, we formulated a fractional programming problem, and to overcome its nonconvex objective function and constraints, we developed four successive convex approximation methods. Our extensive numerical experiments demonstrated that these algorithms can achieve nearoptimal performance. Furthermore, in the perfect CSI case, EE performance was improved by 62% over the maximum ratio transmission (MRT) scheme, and EE of the proposed RF/VLC network increased by 41% compared with traditional that of the RF heterogeneous wireless networks. In the imperfect CSI case, we first investigated the importance of robust beamforming when channel errors occur. Moreover, the proposed algorithm and network architecture outperformed the MRT scheme and the traditional heterogeneous wireless network by 229% and 93%, respectively. In summary, the careful design of an energy-efficient beamforming scheme was proved essential, and the proposed mixed RF/VLC network architecture was much more energy efficient than those without VLC.