Due to the advanced development in the multimedia-on-demand traffic in different forms of audio, video, and images, has extremely moved on the vision of the Internet of Things (IoT) from scalar to Internet of Multimedia Things (IoMT). Since Unmanned Aerial Vehicles (UAVs) generates a massive quantity of the multimedia data, it becomes a part of IoMT, which are commonly employed in diverse application areas, especially for capturing remote sensing (RS) images. At the same time, the interpretation of the captured RS image also plays a crucial issue, which can be addressed by the multi-label classification and Computational Linguistics based image captioning techniques. To achieve this, this paper presents an efficient low complexity encoding technique with multi-label classification and image captioning for UAV based RS images. The presented model primarily involves the low complexity encoder using the Neighborhood Correlation Sequence (NCS) with a burrows wheeler transform (BWT) technique called LCE-BWT for encoding the RS images captured by the UAV. The application of NCS greatly reduces the computation complexity and requires fewer resources for image transmission. Secondly, deep learning (DL) based shallow convolutional neural network for RS image classification (SCNN-RSIC) technique is presented to determine the multiple class labels of the RS image, shows the novelty of the work. Finally, the Computational Linguistics based Bidirectional Encoder Representations from Transformers (BERT) technique is applied for image captioning, to provide a proficient textual description of the RS image. The performance of the presented technique is tested using the UCM dataset. The simulation outcome implied that the presented model has obtained effective compression performance, reconstructed image quality, classification results, and image captioning outcome.