This paper discusses the target localization problem of wireless visual sensor networks. Specifically, each node with a low-resolution camera extracts multiple feature points to represent the target at the sensor node level. A statistical method of merging the position information of different sensor nodes to select the most correlated feature point pair at the base station is presented. This method releases the influence of the accuracy of target extraction on the accuracy of target localization in universal coordinate system. Simulations show that, compared with other relative approach, our proposed method can generate more desirable target localization's accuracy, and it has a better trade-off between camera node usage and localization accuracy.