The objective of detection in remote sensing images is to determine the location and category of all targets in these images. The anchor based methods are the most prevalent deep learning based methods, and still have some problems that need to be addressed. First, the existing metric (i.e., intersection over union (IoU)) could not measure the distance between two bounding boxes when they are nonoverlapping. Second, the exsiting bounding box regression loss could not directly optimize the metric in the training process. Third, the existing methods which adopt a hierarchical deep network only choose a single level feature layer for the feature extraction of region proposals, meaning they do not take full use of the advantage of multi-level features. To resolve the above problems, a novel object detection method for remote sensing images based on improved bounding box regression and multi-level features fusion is proposed in this paper. First, a new metric named generalized IoU is applied, which can quantify the distance between two bounding boxes, regardless of whether they are overlapping or not. Second, a novel bounding box regression loss is proposed, which can not only optimize the new metric (i.e., generalized IoU) directly but also overcome the problem that existing bounding box regression loss based on the new metric cannot adaptively change the gradient based on the metric value. Finally, a multi-level features fusion module is proposed and incorporated into the existing hierarchical deep network, which can make full use of the multi-level features for each region proposal. The quantitative comparisons between the proposed method and baseline method on the large scale dataset DIOR demonstrate that incorporating the proposed bounding box regression loss, multi-level features fusion module, and a combination of both into the baseline method can obtain an absolute gain of 0.7%, 1.4%, and 2.2% or so in terms of mAP, respectively. Comparing this with the state-of-the-art methods demonstrates that the proposed method has achieved a state-of-the-art performance. The curves of average precision with different thresholds show that the advantage of the proposed method is more evident when the threshold of generalized IoU (or IoU) is relatively high, which means that the proposed method can improve the precision of object localization. Similar conclusions can be obtained on a NWPU VHR-10 dataset.