The emergence of deep learning-based classification methods has led to considerable advancements and remarkable performance in image recognition. This study introduces the Multiscale Feature Convolutional Neural Network (MSFCNN) for the extraction of complex urban land cover data, with a specific emphasis on buildings and roads. MSFCNN is employed to extract multiscale features from three distinct image types—Unmanned Aerial Vehicle (UAV) images, high-resolution satellite images (HR), and low-resolution satellite images (LR)—all collected within the Fengshan District of Kaohsiung, Taiwan. The model in this study demonstrated remarkable accuracy in classifying two key land cover categories. Its success in extracting multiscale features from different image resolutions. In the case of UAV images, MSFCNN achieved an accuracy rate of 91.67%, with a Producer’s Accuracy (PA) of 93.33% and a User’s Accuracy (UA) of 90.0%. Similarly, the model exhibited strong performance with HR images, yielding accuracy, PA, and UA values of 92.5%, 93.33%, and 91.67%, respectively. These results closely align with those obtained for LR imagery, which achieved respective accuracy rates of 93.33%, 95.0%, and 91.67%. Overall, the MSFCNN excels in the classification of both UAV and satellite images, showcasing its versatility and robustness across various data sources. The model is well suited for the task of updating cartographic data related to urban buildings and roads.