Recent advances in large-scale image and video analysis have empowered the potential capabilities of visual surveillance systems. In particular, deep learning-based approaches bring in substantial benefits in solving certain computer vision problems such as fine-grained object recognition. Here, the authors mainly concentrate on classification and identification of maritime vessels and land vehicles, which are the key constituents of visual surveillance systems. Employing publicly available data sets for maritime vessels and land vehicles, the authors aim to improve visual recognition. Specifically, the authors focus on five tasks regarding visual recognition; coarse-grained classification, fine-grained classification, coarse-grained retrieval, fine-grained retrieval, and verification. To increase the performance in these tasks, the authors utilise a multi-task learning framework and present a novel loss function which simultaneously considers deep feature learning and classification by exploiting the available hierarchical labels of individual samples and the global statistics of distances between the data pairs. The authors observe that the proposed multi-task learning model improves the fine-grained recognition performance on MARVEL and Stanford Cars data sets, compared to training of a model targeting a single recognition task.