In recent years, cross-modal remote sensing image retrieval(CMRSIR) has attracted a lot of attention in remote sensing(RS) information processing. It's worth mentioning that land cover scenes, whether unimodal or cross-modal, are the primary research contents of remote sensing image retrieval (RSIR), and there are few studies on vessel images captured by RS satellites, let alone cross-modal retrieval tasks. Vessel images have smaller scale, lower resolution, and less detailed information than land cover images, so it's difficult to retrieve the exact images we want. In this paper, a hashing method called deep adversarial cascaded hashing (DACH) is proposed to address these problems. To extract the subtle and discriminative features contained in RS vessel images accurately, we build a deep cascaded network that fuses multilevel features boosted both in depth and width, and the self-attention mechanism can further enhance the fused features. Combined with hash learning, we also design a weighted quintuplet loss to supervise the transition of discrimination and similarity between different metric spaces, and reduce crossmodal discrepancy at the same time. In addition, we apply the deep adversarial constraint to both feature learning and hash learning, trying to bridge the modality gap and achieve a cross-modal retrieval as precise as unimodal retrieval. Comprehensive experiments on two public bimodal vessel image datasets compared with several excellent cross-modal retrieval methods are conducted to demonstrate the effectiveness of our DACH, and the results show that the proposed method is effective and competitive on cross-modal vessel image retrieval tasks, outperforming state-of-the-art methods.