Feature selection to reduce redundancies for efficient classification is necessary but usually time consuming and challenging. This paper proposed a comprehensive analysis for optimum feature selection and the most efficient classifier for accurate urban area mapping. To this end, 136 multiscale textural features alongside a panchromatic band were initially extracted from WorldView-2, GeoEye-3, and QuickBird satellite images. The wrapper-based and filter-based feature selection were implemented to optimally select the best ten percent of the primary features from the initial feature set. Then, machine leaning algorithms such as artificial neural network (ANN), support vector machine (SVM), and random forest (RF) classifiers were utilized to evaluate the efficiency of these selected features and select the most efficient classifier. The achieved optimum feature set was validated using two other images of WorldView-3 and Pleiades. The experiments revealed that RF, particle swarm optimization (PSO), and neighborhood component analysis (NCA) resulted in the most efficient classifier and wrapper-based and filter-based methods, respectively. While ANN and SVM’s process time depended on the number of input features, RF was significantly resistant to the criterion. Dissimilarity, contrast, and correlation features played the greatest contributing role in the classification performance among the textural features used in this study. These trials showed that the feature number could be reduced optimally to 14 from 137; these optimally selected features, alongside the RF classifier, can produce an F1-measure of about 0.90 for different images from five very high resolution satellite sensors for various urban geographical landscapes. These results successfully achieve our goal of assisting users by eliminating the task of optimal feature selection and classifier, thereby increasing the efficiency of urban land use/cover classification from very high resolution images. This optimal feature selection can also significantly reduce the high computational load of the feature-engineering phase in the machine and deep learning approaches.