Cross-depiction is the recognition -and synthesis -of objects whether they are photographed, painted, drawn, etc. It is a significant yet underresearched problem. Emulating the remarkable human ability to recognise and depict objects in an astonishingly wide variety of depictive forms is likely to advance both the foundations and the applications of computer vision. In this paper we motivate the cross-depiction problem, explain why it is difficult, and discuss some current approaches. Our main conclusions are (i) appearance-based recognition systems tend to be over-fitted to one depiction, (ii) models that explicitly encode spatial relations between parts are more robust, and (iii) recognition and non-photorealistic synthesis are related tasks.
We present a Bayesian approach to tactile object recognition that improves on state-of-the-art in using singletouch events in two ways. First by improving recognition accuracy from about 90% to about 95%, using about half the number of touches. Second by reducing the number of touches needed for training from about 200 to about 60. In addition, we use a new tactile sensor that is less than one tenth of the cost of widely available sensors. The paper describes the sensor, the likelihood function used with the Naive Bayes classifier, and experiments on a set of ten real objects. We also provide preliminary results to test our approach for its ability to generalise to previously unencountered objects.
T he cross-depiction problem is that of recognising visual objects regardless of whether they are photographed, painted, drawn, etc. It is a potentially significant yet under-researched problem. Emulating the remarkable human ability to recognise objects in an astonishingly wide variety of depictive forms is likely to advance both the foundations and the applications of Computer Vision.In this paper we benchmark classification, domain adaptation, and deep learning methods; demonstrating that none perform consistently well in the crossdepiction problem. Given the current interest in deep learning, the fact such methods exhibit the same behaviour as all but one other method: they show a significant fall in performance over inhomogeneous databases compared to their peak performance, which is always over data comprising photographs only. Rather, we find the methods that have strong models of spatial relations between parts tend to be more robust and therefore conclude that such information is important in modelling object classes regardless of appearance details.
This paper explores ways of combining vision and touch for the purpose of object recognition. In particular, it focuses on scenarios when there are few tactile training samples (as these are usually costly to obtain) and when vision is artificially impaired. Whilst machine vision is a widely studied field, and machine touch has received some attention recently, the fusion of both modalities remains a relatively unexplored area. It has been suggested that, in the human brain, there exist shared multi-sensorial representations of objects. This provides robustness when one or more senses are absent or unreliable. Modern robotics systems can benefit from multi-sensorial input, in particular in contexts where one or more of the sensors perform poorly. In this paper, a recently proposed tactile recognition model was extended by integrating a simple vision system in three different ways: vector concatenation (vision feature vector and tactile feature vector), object label posterior averaging and object label posterior product. A comparison is drawn in terms of overall accuracy of recognition and in terms of how quickly (number of training samples) learning occurs. The conclusions reached are: (1) the most accurate system is “posterior product”, (2) multi-modal recognition has higher accuracy to either modality alone if all visual and tactile training data are pooled together, and (3) in the case of visual impairment, multi-modal recognition “learns faster”, i.e. requires fewer training samples to achieve the same accuracy as either other modality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.