In a world dependent on road-based transportation, it is essential to understand automobiles. We propose an acoustic road vehicle characterization system as an integrated approach for using sound captured by mobile devices to enhance transparency and understanding of vehicles and their condition for non-expert users. We develop and implement novel deep learning cascading architectures, which we define as conditional, multi-level networks that process raw audio to extract highly granular insights for vehicle understanding. To showcase the viability of cascading architectures, we build a multi-task convolutional neural network that predicts and cascades vehicle attributes to enhance misfire fault detection. We train and test these models on a synthesized dataset reflecting more than 40 hours of augmented audio. Through cascading fuel type, engine configuration, cylinder count and aspiration type attributes, our cascading CNN achieves 87.0% test set accuracy on misfire fault detection which demonstrates margins of 8.0% and 1.7% over naïve and parallel CNN baselines. We explore experimental studies focused on acoustic features, data augmentation, and data reliability. Finally, we conclude with a discussion of broader implications, future directions, and application areas for this work.