An accurate estimate of rainfall levels is fundamental in numerous application scenarios: weather forecasting, climate models, design of hydraulic structures, precision agriculture, etc. An accurate estimate becomes essential to be able to warn of the imminent occurrence of a calamitous event and reduce the risk to human beings. Unfortunately, to date, traditional techniques for estimating rainfall levels present numerous critical issues. The algorithm applies the Convolution Neural Network (CNN) directly to the audio signal, using 3 s sliding windows with an offset of only 100 milliseconds. Therefore, by using low cost and low power hardware, the proposed algorithm allows implementing critical high rainfall event alerting mechanisms with short response times and low estimation errors. More specifically, this paper proposes a new approach to rainfall estimation based on the classification of different acoustic timbres that rain produces at different intensities and on CNN. The results obtained on seven classes ranging from “No rain” to “Cloudburst” indicate an average accuracy of 75%, which rises to 93% if the misclassifications of the adjacent classes are not considered. Some application contexts concern smart cities for which the integration of an audio sensor inside the luminaire of a street lamp is foreseen, precision agriculture, as well as highway safety, by minimizing the risks of aquaplaning.