Remote sensing image captioning is the challenging task due to low global information, single feature extraction and lack of detailed image captions. To address these issues, this research proposed a deep attention based DenseNet with visual switch added bidirectional long short-term memory (DADN-BiLSTM) for captioning. In this research, initially the images and captions are collected from captioning dataset to smooth away small structures. After that, a double attention mechanism is applied to DenseNet for capturing weak features and to improve the problem corresponds between image feature and captioning information. At the same time, a clustering-based segmentation is more useful and easier to segment the image as smaller parts to make the access easily. Moreover, a decoder is used to improve the use of captioning context information. Then the proposed system is implemented in PYTHON and the performance is evaluated against existing methods in terms of some relevant evaluation metrics such as, recall-oriented understudy for gisting evaluation, accuracy and bilingual evaluation understudy. Finally, the experimental results achieve higher scores in all evaluation indicators such as 0.8925 BLEU1, 0.8514 BLEU2, 0.8252 BLEU3, 0.8312 BLEU4 and 0.8611 ROUGE score on UCM captions, 0.8532 BLEU1, 0.7912 BLEU2, 0.8351 BLEU3, 0.7215 BLEU4 and 0.8139 ROUGE score on Sydney captions and 0.8125 BLEU1, 0.7501 BLEU2, 0.6812 BLEU3, 0.7254 BLEU4 and 0.8245 ROUGE score on RSICD captions.