The vibration signals collected from rolling bearings in industrial systems are highly complex and contain intense environmental noise, which challenges the performance of traditional fault diagnosis methods. Moreover, the applicability of the model in engineering practice, especially in the Industrial Internet of Things context, puts forward higher requirements for its storage and computational costs. Considering these challenges, this article proposes an enhanced lightweight multiscale convolutional neural network (CNN) for rolling bearing fault diagnosis. Our contributions mainly fall into three aspects. Firstly, the proposed model is modular and easy to expand, which combines the idea of multiscale learning with attention mechanism and residual learning, enabling the network to extract more abundant and discriminative fault features directly from the raw vibration signal. Consequently, the proposed model can perform better. Secondly, the interpretability of the multiscale learning mechanism is explored by visualizing the extraction process of multiscale features. Finally, for the first time, we introduce the depthwise separable convolution into multiscale CNN to reduce the storage and computational costs of the model, which realizes the lightweight of the model and improves its applicability in the Industrial Internet of Things context. The experimental results on the rolling bearing dataset demonstrate that, compared with the state-of-the-art multiscale CNN models, the proposed model has better discriminative fault feature extraction ability and antinoise ability, and is more suitable for practical industrial systems.