In this paper we investigate image classification with computational resource limits at test time. Two such settings are: 1. anytime classification, where the network's prediction for a test example is progressively updated, facilitating the output of a prediction at any time; and 2. budgeted batch classification, where a fixed amount of computation is available to classify a set of examples that can be spent unevenly across "easier" and "harder" inputs. In contrast to most prior work, such as the popular Viola and Jones algorithm, our approach is based on convolutional neural networks. We train multiple classifiers with varying resource demands, which we adaptively apply during test time. To maximally re-use computation between the classifiers, we incorporate them as early-exits into a single deep convolutional neural network and inter-connect them with dense connectivity. To facilitate high quality classification early on, we use a two-dimensional multi-scale network architecture that maintains coarse and fine level features all-throughout the network. Experiments on three image-classification tasks demonstrate that our framework substantially improves the existing state-of-the-art in both settings.However, the requirements of such competitions differ from realworld applications, which tend to incentivize resource-hungry models with high computational demands at inference time. For example, the COCO 2016 competition was won by a large ensemble of computationally intensive CNNs 1 -a model likely far too computationally expensive for any resource-aware application. Although much smaller models would also obtain decent error, very large, computationally intensive models seem necessary to correctly classify the hard examples that make up the bulk of the remaining misclassifications of modern algorithms. To illustrate this point, Figure 1 shows two images of horses. The left image depicts a horse in canonical pose and is easy to classify, whereas the right image is taken from a rare viewpoint and is likely in the tail of the data distribution. Computationally intensive models are needed to classify such tail examples correctly, but are wasteful when applied to canonical images such as the left one.In real-world applications, computation directly translates into power consumption, which should be minimized for environmental and economical reasons, and is a scarce commodity on mobile 1 http://image-net.org/challenges/talks/2016/GRMI-COCO-slidedeck.pdf 1 arXiv:1703.09844v5 [cs.LG]
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.