Direct Detection (DD) optical performance monitoring (OPM), Modulation Format Identification (MFI), and Baud Rate Identification (BRI) are envisioned as crucial components of future-generation optical networks. They bring to optical nodes and receivers a form of adaptability and intelligent control that are not available in legacy networks. Both are critical to managing the increasing data demands and data diversity in modern and future communication networks (e.g., 5G and 6G), for which optical networks are the backbone. Machine learning (ML) has been playing a growing role in enabling the sought-after adaptability and intelligent control, and thus, many OPM, MFI, and BRI solutions are being developed with ML algorithms at their core. This paper presents a comprehensive survey of the available ML-based solutions for OPM, MFI, and BFI in non-coherent optical networks. The survey is conducted from a machine learning perspective with an eye on the following aspects: (i) what machine learning paradigms have been followed; (ii) what learning algorithms are used to develop DD solutions; and (iii) what types of DD monitoring tasks have been commonly defined and addressed. The paper surveys the most widely used features and ML-based solutions that have been considered in DD optical communication systems. This results in a few observations, insights, and lessons. It highlights some issues regarding the ML development procedure, the dataset construction and training process, and the solution benchmarking dataset. Based on those observations, the paper shares a few insights and lessons that could help guide future research.