Abstract. It has been shown that the activations invoked by an image within the top layers of a large convolutional neural network provide a high-level descriptor of the visual content of the image. In this paper, we investigate the use of such descriptors (neural codes) within the image retrieval application. In the experiments with several standard retrieval benchmarks, we establish that neural codes perform competitively even when the convolutional neural network has been trained for an unrelated classification task (e.g. Image-Net). We also evaluate the improvement in the retrieval performance of neural codes, when the network is retrained on a dataset of images that are similar to images encountered at test time.We further evaluate the performance of the compressed neural codes and show that a simple PCA compression provides very good short codes that give state-of-the-art accuracy on a number of datasets. In general, neural codes turn out to be much more resilient to such compression in comparison other state-of-the-art descriptors. Finally, we show that discriminative dimensionality reduction trained on a dataset of pairs of matched photographs improves the performance of PCAcompressed neural codes even further. Overall, our quantitative experiments demonstrate the promise of neural codes as visual descriptors for image retrieval.
Estimation of facial shapes plays a central role for face transfer and animation. Accurate 3D face reconstruction, however, often deploys iterative and costly methods preventing real-time applications. In this work we design a compact and fast CNN model enabling real-time face reconstruction on mobile devices. For this purpose, we first study more traditional but slow morphable face models and use them to automatically annotate a large set of images for CNN training. We then investigate a class of efficient MobileNet CNNs and adapt such models for the task of shape regression. Our evaluation on three datasets demonstrates significant improvements in the speed and the size of our model while maintaining state-of-the-art reconstruction accuracy.
ABSTRACT:We present a system for the large-scale automatic traffic signs recognition and mapping and experimentally justify design choices made for different components of the system. Our system works with more than 140 different classes of traffic signs and does not require labor-intensive labelling of a large amount of training data due to the training on synthetically generated images. We evaluated our system on the large dataset of Russian traffic signs and made this dataset publically available to encourage future comparison.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.