In this survey paper, we systematically summarize existing literature on bearing fault diagnostics with machine learning (ML) and data mining techniques. While conventional ML methods, including artificial neural network (ANN), principal component analysis (PCA), support vector machines (SVM), etc., have been successfully applied to the detection and categorization of bearing faults for decades, recent developments in deep learning (DL) algorithms in the last five years have sparked renewed interest in both industry and academia for intelligent machine health monitoring. In this paper, we first provide a brief review of conventional ML methods, before taking a deep dive into the state-of-the-art DL algorithms for bearing fault applications. Specifically, the superiority of DL based methods over conventional ML methods are analyzed in terms of fault feature extraction and classification performances; many new functionalities enabled by DL techniques are also summarized. In addition, to obtain a more intuitive insight, a comparative study is conducted on the classification accuracy of different algorithms utilizing the open source Case Western Reserve University (CWRU) bearing dataset. Finally, to facilitate the transition on applying various DL algorithms to bearing fault diagnostics, detailed recommendations and suggestions are provided for specific application conditions such as the setup environment, the data size, and the number of sensors and sensor types. Future research directions to further enhance the performance of DL algorithms on health monitoring are also discussed.