The matching between 3D model projection and 2D image data is a key technique for model based localization, recognition and tracking problems. Firstly, we propose a fitness function to evaluate the matching degree that uses image gradient information in the neighborhood of model projection. The weighting adjustment and the normalization for visible model projection are involved, which improves the correctness and robustness of fitness function. The fitness function is used for vehicle localization and the 3D pose is reduced to location and orientation. Then, we present a direct search optimization method with 3×3 search kernel for location estimation. The "disturbed particles" is used to avoid falling into local optimum and the coarse-to-fine optimization strategy is adopted to greatly reduce computational cost. Finally, we propose a 3D pose estimator to find location and orientation by optimizing the fitness function within orientation range. Experiments on real traffic surveillance videos reveal that the proposed optimization algorithm is effective and both fitness function and 3D pose estimator are correct and robust against clutter and occlusion.