The microseismic signals of coal and rock fractures collected by underground sensors contain masses of blasting vibration signals generated by coal mine blasting, and the waveforms of the two signals are highly similar. In order to identify the true microseismic signals with a microseismic monitoring system quickly and accurately, this paper proposes a lightweight network model that combines a convolutional neural network (CNN) and transformer, named CCViT. Of these, the CNN is used to extract shallow features locally, and the transformer is used to extract deep features globally. Moreover, a modified channel attention module provides important channel information for the model and suppresses useless information. The experimental results on the dataset used in this paper show that the proposed CCViT model has significant advantages for floating point operations (FLOPs), parameter quantity, and accuracy compared to many advanced network models.