This paper addresses the problem of Visible Light Communication (VLC)-based indoor localization and handover, where mobile users communicate with hybrid VLC/mmWave Access Points (APs). The VLC system consists of multiple Light-Emitting Diodes (LEDs) treated as VLC transmitters, multiple Photodiodes (PDs) on the user's smart device, and multiple mmWave Radio Frequency (RF) transmitters used as complementary APs for the VLC system in the case of blockage. We propose a Convolutional Neural Network (CNN)-based algorithm consisting of offline and online modes. In the offline mode, we gather a data set by dividing the environment into fixed-sized elements where the received VLC signal along with the data attained from the smart device at each element represent a sample to train a CNN model for indoor localization. In the online mode, users employ the received VLC signals to estimate their locations. We then propose a virtual soft handover process according to the Coordinated Multiple Point (CoMP) transmission, where the HandOver Margin (HOM) and Waiting Time (WT) are dynamically set based on the change in Signal-to-Noise-Ratio (SNR) values in consecutive time slots. We derive a closed-form expression for the average effective throughput during the handover process, which shows the algorithm's superior performance compared to conventional soft and hard handovers. Simulation results show an average positioning error of 4.31 centimeters for the proposed localization algorithm in a 5 × 4 × 3 m 3 smart environment.