The unique features of millimeter waves (mmWaves) motivate its leveraging to future, beyond-fifth-generation/sixth-generation (B5G/6G)-based device-to-device (D2D) communications. However, the neighborhood discovery and selection (NDS) problem still needs intelligent solutions due to the trade-off of investigating adjacent devices for the optimum device choice against the crucial beamform training (BT) overhead. In this paper, by making use of multiband (μW/mmWave) standard devices, the mmWave NDS problem is addressed using machine-learning-based contextual multi-armed bandit (CMAB) algorithms. This is done by leveraging the context information of Wi-Fi signal characteristics, i.e., received signal strength (RSS), mean, and variance, to further improve the NDS method. In this setup, the transmitting device acts as the player, the arms are the candidate mmWave D2D links between that device and its neighbors, while the reward is the average throughput. We examine the NDS’s primary trade-off and the impacts of the contextual information on the total performance. Furthermore, modified energy-aware linear upper confidence bound (EA-LinUCB) and contextual Thomson sampling (EA-CTS) algorithms are proposed to handle the problem through reflecting the nearby devices’ withstanding battery levels, which simulate real scenarios. Simulation results ensure the superior efficiency of the proposed algorithms over the single band (mmWave) energy-aware noncontextual MAB algorithms (EA-UCB and EA-TS) and traditional schemes regarding energy efficiency and average throughput with a reasonable convergence rate.