The problems of insufficient recognition accuracy, poor real‐time performance and lack of consideration of actual working conditions in the process of intelligent construction of coal mines make this technology still in the research stage and not applied in practical engineering. The purpose of this paper is to establish an accurate and real‐time recognition model, which can quickly distinguish the vibration acceleration signals of coal and gangue under the influence of external factors such as impact position, velocity, and direction by using the different physical properties of coal and gangue particles. Therefore, the accuracy and real‐time of coal gangue recognition model established by different convolutional neural networks (CNN) structures and different position signal input are studied. First, to meet the real‐time requirements, an original CNN recognition model composed of single convolution layer and single pooling layer is established, and the data collected by seven sensors are input in the form of two‐dimensional matrix. However, the stability of the training and test results is insufficient. To solve this problem, once improved CNN (OI‐CNN) recognition model with multiconvolution layers and multipooling layers is built by deepening the network. The experimental results show that the stability and accuracy are improved, but the real‐time performance is poor. Furthermore, through parameter adjustment, the OI‐CNN is changed to the twice improved CNN (TI‐CNN), and the sensor data at different positions are input in the form of one‐dimensional vectors. The results show that the accuracy and real‐time performance of the TI‐CNN coal gangue recognition model are further improved. Finally, according to the research purpose of this paper, the weights of CNN indexes are given, and a multi‐index comprehensive evaluation system (MICES) is established. With the original CNN recognition model as the control, the OI‐CNN recognition model and the TI‐CNN recognition model at different positions are quantitatively compared to obtain the comprehensive evaluation scores of each model. The results show that the MICES of the coal gangue recognition model established based on the TI‐CNN structure and the data input of a single position sensor is the highest, while the sensor position has little effect on the recognition results.