High-resolution urban image clustering has remained a challenging task. This is mainly because its performance strongly depends on the discrimination power of features. Recently, several studies focused on unsupervised learning methods by autoencoders to learn and extract more efficient features for clustering purposes. This paper proposes a Boosted Convolutional AutoEncoder (BCAE) method based on feature learning for efficient urban image clustering. The proposed method was applied to multi-sensor remote-sensing images through a multistep workflow. The optical data were first preprocessed by applying a Minimum Noise Fraction (MNF) transformation. Then, these MNF features, in addition to the normalized Digital Surface Model (nDSM) and vegetation indexes such as Normalized Difference Vegetation Index (NDVI) and Excess Green (ExG(2)), were used as the inputs of the BCAE model. Next, our proposed convolutional autoencoder was trained to automatically encode upgraded features and boost the hand-crafted features for producing more clustering-friendly ones. Then, we employed the Mini Batch K-Means algorithm to cluster deep features. Finally, the comparative feature sets were manually designed in three modes to prove the efficiency of the proposed method in extracting compelling features. Experiments on three datasets show the efficiency of BCAE for feature learning. According to the experimental results, by applying the proposed method, the ultimate features become more suitable for clustering, and spatial correlation among the pixels in the feature learning process is also considered.