Recently, unmanned aerial vehicles (UAV) are widely used in many fields due to the low cost and high flexibility. One of the most popular applications of UAV is vehicle detection in aerial images which plays an important role in traffic surveillance and urban planning. Although, many deep learning based detectors have achieved state-of-the-art (SOTA) performance in natural images, the significant variation in object scales caused by the altitude change of the UAV platform brings great challenges to these detectors for precise localization of vehicles in aerial images. To improve the detection performance for vehicles with different scales, we propose a novel detection algorithm which consists of three stages. In the first stage, to reduce the distortion of vehicles during image resizing and keep more information of aerial images, we utilize an image cropping strategy to divide the image into two patches. In the second stage, we combine the original image and two patches into a batch and detect vehicles with a Convolutional Neural Network (CNN). For feature representation in our detector, we propose Scale-specific Prediction to strengthen the multi-scale features of vehicles with context information. In the final stage, to fuse detections and suppress false alarms, we propose an Outlier-Aware Non-Maximum Suppression. Extensive experiments are conducted to demonstrate the superiority of the proposed algorithm by comparison with other SOTA solutions.