Remote sensing image object detection is widely used in civil and military fields. The important task is to detect objects such as ships, planes, airports, harbours and so on, and then it can obtain object category and position information. It is of great significance to use remote sensing images to observe the densely arranged and directional targets such as cars and ships parked in parking lots and harbours. The object detection task mainly includes object localization and classification. Remote sensing images contain large number of small objects and dense scenes due to the long shooting distance and wide coverage. Small objects occupy few pixels in the image, and they are easily miss‐detected. In dense scenes, the overlapping part of each object is large, so it is easy to detect objects repeatedly. The traditional small object detection methods deliver low accuracy and take long time. Therefore, object detection is very challenging. We put forward a novel deep learning‐based single shot multibox detector (SSD) model for object detection. First, we propose an improved inception network to optimize SSD to strengthen the small object feature extraction ability (FEA) in the shallow network. Second, the feature pyramid network is modified to enhance the fusion effect. Third, the deep feature fusion module is designed to improve the FEA of the deep network. Finally, the extracted image features are matched with candidate boxes with different aspect ratios to perform object detection and location with different scales. Experiments on DOTA show that the proposed algorithm can adapt to the remote sensing object detection in different backgrounds, and effectively improve the detection effect of remote sensing objects in complex scenes.