Keypoint-based object detection is one of the most efficient and speedy methods at present, yet its performance is often worse than the anchor-based method. Without prior settings in the keypoint-based method, the huge search space of the keypoints results in the high recall but low precision. In this paper, the wide dual-path backbone network is introduced as a feature extractor to extract richer original information, which has fewer parameters and better classification performance. Then, the attention fusion module is designed to effectively fuse the dual-path with the consideration of the respective advantages of the residual-path and the densely connected path. In order to provide more accurate pixel-level information for keypoint prediction, the upsample dual-attention module is proposed to recover the spatial size of the feature map, which integrates multi-scale of channel-wise and spatial attention. Compared with other state-of-the-art detectors, this method has achieved accuracy-efficiency results with fewer parameters, lower FLOPs, and smaller model size. Experimental results show that the proposed wide dual-path backbone network has achieved 4.98% top1-error on the CIFAR-10 classification dataset. On the PASCAL VOC object detection dataset, this model has achieved an accuracy-efficiency tradeoff result of 78.3% mAP at the speed of 41 FPS.This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.