Classification of local soil conditions is important for the interpretation of structural seismic damage, which also plays a vital role in site-specific seismic hazard analyses. In this study, we propose to classify sites as an image recognition task using a deep convolutional neural network (DCNN)-based technique. We design the input image as a combination of the topographic slope and the mean horizontal-to-vertical spectral ratio (HVSR) of earthquake recordings. A DCNN model with five convolutional layers is trained using 1649 sites in Japan. The recall rates for site classes C, D, and E using our DCNN classifier for Japanese sites are 82%, 70%, and 60%, respectively. When compared with existing site classification schemes relying on predefined standard HVSR curves, our proposed method achieves the highest total accuracy rate (between 73% and 75%). The generality and applicability of our trained classifier are further validated using sites in Europe with a total accuracy between 64% and 66%. The proposed data-driven approach could be extended to other types of site amplification functions in the future.