Self-driving car is a hot research topic in the field of intelligent transportation system, which can greatly alleviate traffic jams and improve travel efficiency. Scene classification is one of the key technologies of self-driving cars, which can provide the basis for decision-making in self-driving cars. In recent years, deep learning-based solutions have achieved good results in the problem of scene classification. However, some problems should be further studied in the scene classification methods, such as how to deal with the similarities among different categories and the differences among the same category. To deal with these problems, an improved deep network-based scene classification method is proposed in this paper. In the proposed method, an improved Faster RCNN (Regions with CNN features) network is used to extract the features of representative objects in the scene to obtain local features, where a new residual attention block is added to the Faster RCNN network to highlight local semantics related to driving scenarios. In addition, an improved Inception module is used to extract global features, where a mixed Leaky ReLU and ELU function is presented, to reduce the possible redundancy of the convolution kernel and enhance the robustness. Then the local features and the global features are fused to realize the scene classification.Finally, a private data set is built from the public data sets for the specialized application of scene classification in the self-driving field and the proposed method is tested on the proposed data set. The experimental results show that the accuracy of the proposed method can reach 94.76%, which is higher than the state-of-the-art methods.