The effectiveness of the SAR object detection technique based on Convolutional Neural Networks (CNNs) has been widely proven, and it is increasingly used in the recognition of ship targets. Recently, efforts have been made to integrate transformer structures into SAR detectors to achieve improved target localization. However, existing methods rarely design the transformer itself as a detector, failing to fully leverage the long-range modeling advantages of self-attention. Furthermore, there has been limited research into multi-class SAR target detection. To address these limitations, this study proposes a SAR detector named CCDN-DETR, which builds upon the framework of the detection transformer (DETR). To adapt to the multiscale characteristics of SAR data, cross-scale encoders were introduced to facilitate comprehensive information modeling and fusion across different scales. Simultaneously, we optimized the query selection scheme for the input decoder layers, employing IOU loss to assist in initializing object queries more effectively. Additionally, we introduced constrained contrastive denoising training at the decoder layers to enhance the model’s convergence speed and improve the detection of different categories of SAR targets. In the benchmark evaluation on a joint dataset composed of SSDD, HRSID, and SAR-AIRcraft datasets, CCDN-DETR achieves a mean Average Precision (mAP) of 91.9%. Furthermore, it demonstrates significant competitiveness with 83.7% mAP on the multi-class MSAR dataset compared to CNN-based models.