Device-to-device (D2D) communication is a promising paradigm for the fifth generation 2 (5G) and beyond 5G (B5G) networks. Although D2D communication provides several benefits, 3 including limited interference, energy efficiency, reduced delay, and network overhead, it faces a lot 4 of technical challenges such as network architecture, and neighbor discovery, etc. The complexity 5 of configuring D2D links and managing their interference, especially when using millimeter-wave 6 (mmWave), inspire researchers to leverage different machine-learning (ML) techniques to address 7 these problems towards boosting the performance of D2D networks. In this paper, a comprehensive 8 survey about recent research activities on D2D networks will be explored with putting more 9 emphasis on utilizing mmWave and ML methods. After exploring existing D2D research directions 10 accompanied with their existing conventional solutions, we will show how different ML techniques 11 can be applied to enhance the D2D networks performance over using conventional ways. Then, still 12 open research directions in ML applications on D2D networks will be investigated including their 13 essential needs. A case study of applying multi-armed bandit (MAB) as an efficient online ML tool 14 to enhance the performance of neighbor discovery and selection (NDS) in mmWave D2D networks 15 will be presented. This case study will put emphasis on the high potency of using ML solutions 16 over using the conventional non-ML based methods for highly improving the average throughput 17 performance of mmWave NDS.