In remote sensing, active learning (AL) is considered to be an effective solution to the problem of producing sufficient classification accuracy with a limited number of training samples. Though this field has been extensively studied, most papers exist in the pixel-based paradigm. In object-based image analysis (OBIA), AL has been comparatively less studied. This paper aims to propose a new AL method for selecting object-based samples. The proposed AL method solves the problem of how to identify the most informative segment-samples so that classification performance can be optimized. The advantage of this algorithm is that informativeness can be estimated by using various object-based features. The new approach has three key steps. First, a series of one-against-one binary random forest (RF) classifiers are initialized by using a small initial training set. This strategy allows for the estimation of the classification uncertainty in great detail. Second, each tested sample is processed by using the binary RFs, and a classification uncertainty value that can reflect informativeness is derived. Third, the samples with high uncertainty values are selected and then labeled by a supervisor. They are subsequently added into the training set, based on which the binary RFs are re-trained for the next iteration. The whole procedure is iterated until a stopping criterion is met. To validate the proposed method, three pairs of multi-spectral remote sensing images with different landscape patterns were used in this experiment. The results indicate that the proposed method can outperform other state-of-the-art AL methods. To be more specific, the highest overall accuracies for the three datasets were all obtained by using the proposed AL method, and the values were 88.32%, 85.77%, and 93.12% for “T1,” “T2,” and “T3,” respectively. Furthermore, since object-based features have a serious impact on the performance of AL, eight combinations of four feature types are investigated. The results show that the best feature combination is different for the three datasets due to the variation of the feature separability.