Multi-label scene classification on remote sensing imagery (RSI) includes the classification of images into multiple categories or labels, where each image belongs to more than one class or scene. This is a common task in RS and computer vision, especially for applications like urban planning, land cover classification, and environmental monitoring. By leveraging the power of deep learning (DL), this model extracts high-level features from the imagery, facilitating efficient and accurate scene classification, which is indispensable for applications including environmental analysis, land use monitoring, and disaster management. This study introduces a new Multi-Label Scene Classification on Remote Sensing Imagery using Modified Dingo Optimizer with Deep Learning (MSCRSI-MDODL) technique. The MSCRSI-MDODL technique targeted the identification and classification of multiple target classes from the RSI. In the presented MSCRSI-MDODL technique, attention Squeeze and Excitation (SE) with DenseNet model, named improved DenseNet model is applied for the extraction of features. Besides, MDO algorithm can be employed for the optimal hyperparameter tuning of the improved Densenet model. For scene classification process, the MSCRSI-MDODL technique makes use of stacked dilated convolutional autoencoders (SDCAE) model. The simulation analysis of the MSCRSI-MDODL model is tested on benchmark RSI datasets. The comprehensive result analysis portrayed the higher performance of the MSCRSI-MDODL technique over other existing techniques for RSI classification.