Almost all rotating machinery in the industry has bearings as their key building block and most of these machines run 24x7. This makes bearing health prediction an active research area for predictive maintenance solutions. Many state of the art Deep Neural Network (DNN) models have been proposed to solve this. However, most of these high performance models are computationally expensive and have high memory requirements. This limits their use to very specific industrial applications with powerful hardwares deployed close the the machinery. In order to bring DNN-based solutions to a potential use in the industry, we need to deploy these models on Microcontroller Units (MCUs) which are cost effective and energy efficient. However, this step is typically neglected in literature as it poses new challenges. The primary concern when inferencing the DNN models on MCUs is the on chip memory of the MCU that has to fit the model, the data and additional code to run the system. Almost all the state of the art models fail this litmus test since they feature too many parameters. In this paper, we show the challenges related to the deployment, review possible solutions and evaluate one of them showing how the deployment can be realized and what steps are needed. The focus is on the steps required for the actual deployment rather than finding the optimal solution. This paper is among the first to show the deployment on MCUs for a predictive maintenance use case. We first analyze the gap between State Of The Art benchmark DNN models for bearing defect classification and the memory constraint of two MCU variants. Additionally, we review options to reduce the model size such as pruning and quantization. Afterwards, we evaluate a solution to deploy the DNN models by pruning them in order to fit them into microcontrollers. Our results show that most models under test can be reduced to fit MCU memory for a maximum loss of 3% in average accuracy of the pruned models in comparison to the original models. Based on the results, we also discuss which methods are promising and which combination of model and feature work best for the given classification problem.