The ability to detect small targets and the speed of the target detector are very important for the application of remote sensing image detection, and in this paper, we propose an effective and efficient method (named CISPNet) with high detection accuracy and compact architecture. In particular, according to the characteristics of the data, we apply a context information scene perception (CISP) module to obtain the contextual information for targets of different scales and use k-means clustering to set the aspect ratios and size of the default boxes. The proposed method inherits the network structure of Single Shot MultiBox Detector (SSD) and introduces the CISP module into it. We create a dataset in the Pascal Visual Object Classes (VOC) format, annotated with the three types of detection targets, aircraft, ship, and oiltanker. Experimental results on our remote sensing image dataset as well as the Northwestern Polytechnical University very-high-resolution (NWPU VRH-10) dataset demonstrate that the proposed CISPNet performs much better than the original SSD and other detectors especially for small objects. Specifically, our network can achieve 80.34% mean average precision (mAP) at the speed of 50.7 frames per second (FPS) with the input size 300 × 300 pixels on the remote sensing image dataset. On extended experiments, the performance of CISPNet in fuzzy target detection in remote sensing image is better than that of SSD.