Crowdsourced social media data has become popular in the assessment of cultural ecosystem services (CES). Advances in deep learning show great potential for the timely assessment of CES at large scales. Here, we describe a procedure for automating the assessment of image elements pertaining to CES from social media. We focus on a binary (natural, human) and a multiclass (posing, species, nature, landscape, human activities, human structures) classification of those elements using two Convolutional Neural Networks (CNNs; VGG16 and ResNet152) with the weights from two large datasets - Places365 and ImageNet -, and our own dataset. We train those CNNs over Flickr and Wikiloc images from the Peneda-Geres region (Portugal) and evaluate their transferability to wider areas, using Sierra Nevada (Spain) as test. CNNs trained for Peneda-Geres performed well, with results for the binary classification (F1-score > 80%) exceeding those for the multiclass classification (> 60%). CNNs pre-trained with Places365 and ImageNet data performed significantly better than with our data. Model performance decreased when transferred to Sierra Nevada, but their performances were satisfactory (> 60%). The combination of manual annotations, freely available CNNs and pre-trained local datasets thereby show great relevance to support automated CES assessments from social media.