CNNs have massively improved performance in object detection in photographs.
However research into object detection in artwork remains limited. We show
state-of-the-art performance on a challenging dataset, People-Art, which
contains people from photos, cartoons and 41 different artwork movements. We
achieve this high performance by fine-tuning a CNN for this task, thus also
demonstrating that training CNNs on photos results in overfitting for photos:
only the first three or four layers transfer from photos to artwork. Although
the CNN's performance is the highest yet, it remains less than 60\% AP,
suggesting further work is needed for the cross-depiction problem. The final
publication is available at Springer via
http://dx.doi.org/10.1007/978-3-319-46604-0_57Comment: 14 pages, plus 3 pages of references; 7 figures in ECCV 2016
Workshop
Conformal Prediction (CP) is a method that can be used for complementing the bare predictions produced by any traditional machine learning algorithm with measures of confidence. CP gives good accuracy and confidence values, but unfortunately it is quite computationally inefficient. This computational inefficiency problem becomes huge when CP is coupled with a method that requires long training times, such as Neural Networks. In this paper we use a modification of the original CP method, called Inductive Conformal Prediction (ICP), which allows us to construct a Neural Network confidence predictor without the massive computational overhead of CP. The method we propose accompanies its predictions with confidence measures that are useful in practice, while still preserving the computational efficiency of its underlying Neural Network.
This paper presents a codebook learning approach for image classification and retrieval. It corresponds to learning a weighted similarity metric to satisfy that the weighted similarity between the same labeled images is larger than that between the differently labeled images with largest margin. We formulate the learning problem as a convex quadratic programming and adopt alternating optimization to solve it efficiently. Experiments on both synthetic and real datasets validate the approach. The codebook learning improves the performance, in particular in the case where the number of training examples is not sufficient for large size codebook.
Abstract. Visual object classification and detection are major problems in contemporary computer vision. State-of-art algorithms allow thousands of visual objects to be learned and recognized, under a wide range of variations including lighting changes, occlusion, point of view and different object instances. Only a small fraction of the literature addresses the problem of variation in depictive styles (photographs, drawings, paintings etc.). This is a challenging gap but the ability to process images of all depictive styles and not just photographs has potential value across many applications. In this paper we model visual classes using a graph with multiple labels on each node; weights on arcs and nodes indicate relative importance (salience) to the object description. Visual class models can be learned from examples from a database that contains photographs, drawings, paintings etc. Experiments show that our representation is able to improve upon Deformable Part Models for detection and Bag of Words models for classification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.