The tremendous advances in deep neural networks have demonstrated the superiority of deep learning techniques for applications such as object recognition or image classification. Nevertheless, deep learning-based methods usually require a large amount of training data, which mainly comes from manual annotation and is quite labor-intensive. In order to reduce the amount of manual work required for generating enough training data, we hereby propose to leverage existing labeled data to generate image annotations automatically. Specifically, the pixel labels are firstly transferred from one image modality to another image modality via geometric transformation to create initial image annotations, and then additional information (e.g., height measurements) is incorporated for Bayesian inference to update the labeling beliefs. Finally, the updated label assignments are optimized with a fully connected conditional random field (CRF), yielding refined labeling for all pixels in the image. The proposed approach is tested on two different scenarios, i.e., (1) label propagation from annotated aerial imagery to unmanned aerial vehicle (UAV) imagery and (2) label propagation from map database to aerial imagery. In each scenario, the refined image labels are used as pseudo-ground truth data for training a convolutional neural network (CNN). Results demonstrate that our model is able to produce accurate label assignments even around complex object boundaries; besides, the generated image labels can be effectively leveraged for training CNNs and achieve comparable classification accuracy as manual image annotations, more specifically, the per-class classification accuracy of the networks trained by the manual image annotations and the generated image labels have a difference within ± 5 % .