In this paper, motivated by the inter-base station (BS) channel dependence due to the shared wireless environment, we propose to fuse sub-6 GHz channel information and mmWave low-overhead measurement to predict the optimal mmWave beam in heterogeneous networks (HetNets) and reduce the overhead of both mmWave BS selection and beam training. Moreover, deep learning is adopted to extract the complex dependence between sub-6 GHz and mmWave channels for achieving high prediction accuracy. Specifically, we propose to leverage a few user equipment (UE)-specific high-quality mmWave wide beams predicted by the sub-6 GHz channel state information (CSI) as the mmWave low-overhead measurement. In order to adapt to different confidences of the mmWave wide beam prediction for diverse UE, the sum-probability criterion is proposed to flexibly adjust the number of measured wide beams. Besides, to fully fuse the diversified features extracted from the sub-6 GHz CSI and mmWave wide beams, the attention mechanism is further exploited to adaptively weight the features for improving the prediction accuracy. Simulation results show that our proposed scheme achieves higher beamforming gain while imposing smaller mmWave measurement overhead over the conventional deep learning based schemes.