Vehicle Make and Model Recognition (VMMR) is one of the fundamental elements in Intelligent Transportation System (ITS) that becomes the enabler for plenty of downstream tasks. Most of the past studies advance the recognition performance by focusing on the top-level feature maps but this practice hinders the ability of the network to learn features. Although the top-level feature maps are rich in global context information, they do not incorporate the fine-scale details that are embedded within the low-level feature maps. In this work, we bridge the gap by proposing a Coarse-to-Fine Context Aggregation (CFCA) module which effectively integrates information from feature maps of various scales. In particular, the crossscale features are generated by first refining the scale-specific components independently and then fusing them in a nonlinear manner through convolution. The resultant multi-scale feature maps are highly discriminative, as they contain both local subtle details and global abstract information. This is proven when the proposed framework achieves astounding classification performance on five publicly available datasets i.e. web-nature Comprehensive Cars (CompCars), Stanford Cars, Car-FG3K, surveillance-nature CompCars and Mohsin-VMMR. Moreover, the neurons exhibit high feature responses on the discriminative vehicle parts, corresponding to the superior feature extraction ability of the CFCA module. The CFCA module is also highly generalizable to other networks as it elevates the performance of VGG16, Inceptionv3, ResNet50 and DenseNet169.