The goal of fine-grained vehicle recognition is to identify the exact subtype of the vehicle from a given image. It plays an important role in the intelligent traffic surveillance system. Although fine-grained vehicle recognition has attracted more and more research interest, it remains as an open problem for vehicle images taken from arbitrary viewpoints. In this study, we present a one-stage deep multi-task learning framework for fine-grained vehicle recognition in traffic surveillance, which performs the fine-grained vehicle recognition and viewpoint estimation simultaneously. In the proposed framework, the fine-grained vehicle recognition is the main task which classifies images into different types, and the viewpoint estimation task is the auxiliary task to learn helpful viewpoint-aware features for improving the main task. We evaluate our method on two common used large-scale fine-grained vehicle recognition datasets, including BoxCars116k and CompCars. The experimental results testify that the proposed multitask framework can improve the accuracy by incorporating the viewpoint information of the vehicles. In comparison to the state-of-the-art, our approach goes beyond, only requiring 3D bounding box for the training phase, which is important for future inferences using the trained model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.