The automatic detection of smoke by analyzing the video stream acquired by traditional surveillance cameras is becoming a more and more interesting problem for the scientific community thanks to the necessity to prevent fires at the very early stages. The adoption of a smart visual sensor, namely a computer vision algorithm running in real time, allows one to overcome the limitations of standard physical sensors. Nevertheless, this is a very challenging problem, due to the strong similarity of the smoke with other environmental elements like clouds, fog and dust. In addition to this challenge, data available for training deep neural networks is limited and not fully representative of real environments. Within this context, in this paper we propose a new method for smoke detection based on the combination of motion and appearance analysis with a modern convolutional neural network (CNN). Moreover, we propose a new dataset, called the MIVIA Smoke Detection Dataset (MIVIA-SDD), publicly available for research purposes; it consists of 129 videos covering about 28 h of recordings. The proposed hybrid method, trained and evaluated on the proposed dataset, demonstrated to be very effective by achieving a 94% smoke recognition rate and, at the same time, a substantially lower false positive rate if compared with fully deep learning-based approaches (14% vs. 100%). Therefore, the proposed combination of motion and appearance analysis with deep learning CNNs can be further investigated to improve the precision of fire detection approaches.