To solve the problem of LiDAR's low accuracy in detecting similar objects and distant small targets, we designs a complementary 3D object detection network for cameras and lidar, the Multi-scale Dynamic Feature Voxel to Point (MDVP-RCNN). MDVP-RCNN is a two-stage 3D object detection network that uses point clouds as nodes, integrating point cloud features and image information onto the point cloud. In the first stage of MDVP-RCNN, the raw point cloud is downsampled to a fixed number of key points via Farthest Point Sampling (FPS), then sparse convolutions and deformable convolutions are used as the backbone network for voxel feature extraction. A dual-channel attention mechanism is introduced in the bird's-eye view (BEV), sequentially learning the essential characteristics of the pseudo-2D image and compensating for the lost features during the 2Dization of the point cloud. In the second stage, a feature aggregation module combines the color information of the image with the point cloud information in a weighted manner. Experimental results show that the network performs excellently on small targets, with Average Precision (AP) of 61.76%, 67.66%, and 82.36% respectively achieved for pedestrian, cyclist, and car.Code is available at https://github.com/3623687277/MDVP-RCNN