We address the problem of training Object Detection models using significantly less bounding box annotated images. For that, we take advantage of cheaper and more abundant image classification data. Our proposal consists in automatically generating artificial detection samples, with no need of expensive detection level supervision, using images with classification labels only. We also detail a pretraining initialization strategy for detection architectures using these artificially synthesized samples, before finetuning on real detection data, and experimentally show how this consistently leads to more data efficient models. With the proposed approach, we were able to effectively use only classification data to improve results on the harder and more supervision hungry object detection problem. We achieve results equivalent to those of the full data scenario using only a small fraction of the original detection data for Face, Bird, and Car detection.
First of all, to my family, for all the support received during this period. Next, I am forever grateful to my supervisor, Nina, who agreed to help me during these past three years, and to prof. Xiaoyi Jiang, who has greatly contributed to the ideas presented here.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.