Although Faster R-CNN has undergone a lot of improvements, it still exists a significant gap in the performance between the detection of small and large objects, mainly because the low-level network lacks semantic information and small objects are only involved in a few images. To mitigate the above issues, we propose an object detection model based on Multi-Scale Feature fusion Cross Stage Partial Network (MSF-CSPNet) in this paper. The proposed MSF-CSPNet focuses on the fusion of concrete features and abstract features from multi-scale feature by learning shallow features at the shallow level and deep features at the deep level. Meanwhile, the data augmentation is performed by using random horizontal flip. On the basis, the improved Faster-RCNN model with Automatic Mixed Precision, Group Batch Sampler and MSF-CSPNet was formed. The proposed algorithm is valuated on the Microsoft Common Objects in Context (MS COCO) 2017 and obtained leading performance with 5.4% improvement in AP coco , 5.9% improvement in AP 50 , 6.9% improvement in AP 75 , 5.8% improvement in AP S , 6.1% improvement in AP M , 5.8% improvement in AP L compare to Faster R-CNN based on ResNet-50 with Feature Pyramid Network (FPN) backbone, and also outperformed previous reports on state-of-art Faster R-CNN series using other backbone networks, especially for small object detection. This research shows that the combination of a backbone with stronger learning ability and FPN is helpful to detect the expression of objects. Faster R-CNN based on MSF-CSPNet has high efficiency and better balance between accuracy and speed.