Classification with a few samples of training set has been a longstanding issue in the field of polarimetric synthetic aperture radar (PolSAR) image analysis and processing. Aiming at the small number of training samples of the PolSAR image classification task, a novel Self‐Supervised Ensemble Learning Framework (SSELF) is designed. The designed SSELF can automatically extract PolSAR features conducive to PolSAR image classification with a small number of training samples. In addition, it can significantly decrease the dependence of neural network algorithms on large labelled samples of training set. First, utilise the spatial–polarimetric features of PolSAR data perfectly, the EfficientNet‐B0 is presented and utilised as the main section of the Deep Learning (DL) model to extract DL features of PolSAR data. Then, using an optimisation function that constrains the cross‐correlation matrix of various distortions of each sample to the identity matrix, the designed DL model can obtain the effective features of homogeneous samples gathering and heterogeneous samples separating from each other in a self‐supervised manner. Moreover, following the great success of curriculum learning in the area of machine learning, a novel deep curriculum‐learning model is proposed, entitled Deep Curriculum Learning (DCL), to train the DL network in our self‐supervised model. The proposed DCL utilises the entropy‐alpha target decomposition to estimate the degree of complexity of each PolSAR image patch before applying it to the EfficientNet‐B0. Also, an accumulative mini‐batch pacing function is used to introduce more difficult patches to EfficientNet‐B0 in the training process of the designed self‐supervised model. Furthermore, two EL models, feature‐level and view‐level ensemble, are proposed to increase the feature extraction capability and classification result of PolSAR data by jointly using spatial features at different scales and polarimetric information at different bands. In fact, in the proposed feature‐level ensemble strategy, to improve the classification result, the extracted features of the different scales are used as the input of the designed feature fusion algorithm. Therefore, the extracted features are mapped to the new space with lower dimension. In general, in the proposed view‐level ensemble strategy, first, the Mutual Information (MI) between each feature and the other features calculates, and then based‐on the calculated MI, different group of features are selected as various views. Next, the final classification result of the PolSAR image classification is obtained by using the designed majority vote of the result in each view. It should be noted that unlike existing traditional methods, both amplitude and phase information of SAR images to devolve the training process of the proposed model are used. In addition, all of the parameters of the proposed network are developed to a complex domain. In addition, a complex backpropagation algorithm by using the gradient‐based model is used for training the network. Finally, experimental results on three well‐known PolSAR data sets illustrate that the designed SSELF can extract more discriminant features using the designed DL model and can achieve better classification results than the other six novel DL models in terms of small training samples and adequate labelled samples.