Video flame and smoke-based fire detection usually exhibit large variations in the feature of color, texture, shapes, etc., caused by the complex environment. It is difficult to develop a robust method to detect fire based on single or multiple fire features. Since convolutional neural network (CNN) has reported state-of-the-art performance in a wide range of fields. This study present a method based on SLIC-DBSCAN and convolutional neural network to recognize flame and smoke modes connected to fire stages. First, simple linear iterative clustering (SLIC) is acted as the pre-processing step to over segment images into super-pixels. Then the use of density based spatial clustering of application with noise (DBSCAN) gathered the similar super-pixels into several clusters, which in turn provide better smoke detection accuracy by using CNN. Comparison studies are performed to base on smoke image from publicly available data and self-collected data. The experimental results demonstrated the improved smoke detection capabilities by the present method.