When original aerial photographs are combined with deep learning to classify forest vegetation cover, these photographs are often hindered by the interlaced composition of complex backgrounds and vegetation types as well as the influence of different deep learning calculation processes, resulting in unpredictable training and test results. The purpose of this research is to evaluate (1) data preprocessing, (2) the number of classification targets, and (3) convolutional neural network (CNN) approaches combined with deep learning’s effects on high-resolution aerial photographs to identify forest and vegetation types. Data preprocessing is mainly composed of principal component analysis and content simplification (noise elimination). The number of classification targets is divided into 14 types of forest vegetation that are more complex and difficult to distinguish and seven types of forest vegetation that are simpler. We used CNN approaches to compare three CNN architectures: VGG19, ResNet50, and SegNet. This study found that the models had the best execution efficiency and classification accuracy after data preprocessing using principal component analysis. However, an increase in the number of classification targets significantly reduced the classification accuracy. The algorithm analysis showed that VGG19 achieved the best classification accuracy, but SegNet achieved the best performance and overall stability of relative convergence. This proves that data preprocessing helps identify forest and plant categories in aerial photographs with complex backgrounds. If combined with the appropriate CNN algorithm, these architectures will have great potential to replace high-cost on-site forestland surveys. At the end of this study, a user-friendly classification system for practical application is proposed, and its testing showed good results.