When the spatial distribution of winter wheat is extracted from high-resolution remote sensing imagery using convolutional neural networks (CNN), field edge results are usually rough, resulting in lowered overall accuracy. This study proposed a new per-pixel classification model using CNN and Bayesian models (CNN-Bayesian model) for improved extraction accuracy. In this model, a feature extractor generates a feature vector for each pixel, an encoder transforms the feature vector of each pixel into a category-code vector, and a two-level classifier uses the difference between elements of category-probability vectors as the confidence value to perform per-pixel classifications. The first level is used to determine the category of a pixel with high confidence, and the second level is an improved Bayesian model used to determine the category of low-confidence pixels. The CNN-Bayesian model was trained and tested on Gaofen 2 satellite images. Compared to existing models, our approach produced an improvement in overall accuracy, the overall accuracy of SegNet, DeepLab, VGG-Ex, and CNN-Bayesian was 0.791, 0.852, 0.892, and 0.946, respectively. Thus, this approach can produce superior results when winter wheat spatial distribution is extracted from satellite imagery. 2 of 21 over the past few decades at regional or global scales [6][7][8]. As extraction of crop spatial distribution mainly relies on pixel-based image classification, correctly determining pixel features for accurate classification is the basis for this approach [9][10][11][12].The spectral characteristics of low-and middle-resolution remote sensing images are usually stable. Vegetation indexes are generally used as pixel features in studies using data from sources including the Moderate Resolution Imaging Spectroradiometer (MODIS) [6,[13][14][15][16], Enhanced Thematic Mapper/Thematic Mapper [13,17], and Systeme Probatoire d' Observation de la Terre [7,10]. These indices include the normalized difference vegetation index (NDVI) [5,6,[13][14][15], relationship analysis of NDVI [8], and enhanced vegetation index (EVI) [3,18], which are extracted from band values. Common classification methods include decision trees [5,11,13], linear regression [6], statistics [7], filtration [13], time-series analysis [14,15], the iterative self-organizing data analysis technique (ISODATA) [16], and the Mahalanobis distance [17]. Texture features can better describe the spatial structure of pixels, the Gray-Level Co-Occurrence Matrix is a commonly used texture feature [19], and Gabor [20] and wavelet transforms [19,21] are often used to extract texture features. Moreover, object-based image analysis technology is also widely used in pre-pixel classification [22,23]. Such methods can successfully extract the spatial distribution of winter wheat and other crops, but limitations in spatial resolution restrict the applicability of the results.The spatial resolution and precision of crop extraction can be significantly improved by using high-resolution imagery [8,24,25]. However,...