Deep learning has recently received extensive attention in the field of rolling-bearing fault diagnosis owing to its powerful feature expression capability. With the help of deep learning, we can fully extract the deep features hidden in the data, significantly improving the accuracy and efficiency of fault diagnosis. Despite this progress, deep learning still faces two outstanding problems. (1) Each layer uses the same convolution kernel to extract features, making it difficult to adaptively select convolution kernels based on the features of the input image, which limits the network’s adaptability to different input features and leads to weak feature extraction. (2) Large number of parameters and long training time. To solve the above problems, this paper proposes an integrated deep neural network that combines an improved selective kernel network (SKNet) with an enhanced Inception-ResNet-v2, named SIR-CNN. First, based on the SKNet, a new three-branch SKNet was designed. Second, the new SKNet is embedded into a depthwise separable convolution network such that the model can adaptively select convolution kernels of different sizes during training. Furthermore, the convolution structure in the Inception-ResNet-v2 network was replaced by the improved depthwise separable convolution network to achieve effective feature extraction. Finally, the time-frequency maps of the raw vibration signals are obtained through short-time Fourier transform (STFT) and then sent to the proposed SIR-CNN network for experiments. The experimental results show that the proposed SIR-CNN achieves superior performance compared to other methods.