Object detection has made significant progress in many real-world scenes. Despite this remarkable progress, the common use case of detection in remote sensing images remains challenging even for leading object detectors, due to the complex background, objects with arbitrary orientation, and large difference in scale of objects. In this paper, we propose a novel rotation detector for remote sensing images, mainly inspired by Mask R-CNN, namely RADet. RADet can obtain the rotation bounding box of objects with shape mask predicted by the mask branch, which is a novel, simple and effective way to get the rotation bounding box of objects. Specifically, a refine feature pyramid network is devised with an improved building block constructing top-down feature maps, to solve the problem of large difference in scales. Meanwhile, the position attention network and the channel attention network are jointly explored by modeling the spatial position dependence between global pixels and highlighting the object feature, for detecting small object surrounded by complex background. Extensive experiments on two remote sensing public datasets, DOTA and NWPUVHR -10, show our method to outperform existing leading object detectors in remote sensing field.Remote Sens. 2020, 12, 389 2 of 20 for each region of interest (RoI). Then, the features are used to achieve category-specific classification and regression for the corresponding proposals. Finally, the final detection result is obtained through post-processing, such as non-maximum suppression. Faster R-CNN is a classical two-stage object detector, which is composed of Region Proposal Networks (RPN) and a detection network consists of classifiers and regressors, and can detect objects quickly and accurately in an end-to-end manner. Based on Faster R-CNN, more improved two-stage object detectors such as Region-based Fully Convolutional Networks (R-FCN) [7] and Mask R-CNN [8] were proposed. To further improve the efficiency of the object detector, Joseph Redmon et al. proposed a single stage target detector based on regression, called YOLO [9]. For the simple structure, You Only Look Once (YOLO) is extremely fast, but its accuracy is lower than that of the two-stage detector. Based on YOLO, YOLO v3 [10] and YOLO 9000 [11] were proposed successively. To trade off the detection speed and accuracy, Single Shot MultiBox Detector (SSD) [12] was proposed, whose speed and accuracy were between YOLO series algorithm and R-CNN series algorithm.