In this article, we give a comprehensive overview of recent methods in object detection using deep learning and their uses in augmented reality. The objective is to present a complete understanding of these algorithms and how augmented reality functions and services can be improved by integrating these methods. We discuss in detail the different characteristics of each approach and their influence on real-time detection performance. Experimental analyses are provided to compare the performance of each method and make meaningful conclusions for their use in augmented reality. Two-stage detectors generally provide better detection performance, while single-stage detectors are significantly more time efficient and more applicable to real-time object detection. Finally, we discuss several future directions to facilitate and stimulate future research on object detection in augmented reality. Keywords: object detection, deep learning, convolutional neural network, augmented reality.
In this paper, we will exploit the potential of convolutional neural networks (CNN) in augmented reality. Our work combines existing approaches and produces a new method for aligning a virtual object in the real world and in real time. Our method consists in detecting 2D objects of one or more classes present in the real world with CNN algorithms, then using the output of the network to calculate the position of the camera with the PnP algorithm in order to augment the detected object with additional information. The lightness of the MobileNet convolutional neural network combined with the speed of the Single Shot multibox Detector (SSD) framework allows to analyze the acquired images in real time and to use devices with limited resources and performance. We use a trained model that detects 20 different classes, the network receives as input an image sequence acquired in real time. The output of the network provides the set of detected classes as well as the coordinates of the corners of the surrounded rectangle on the object of this class. The coplanar coordinates of this rectangle are used to calculate the camera position and to align a 3D virtual object in the middle of the bounding box surrounding the detected object. The results obtained in the experimental part show the importance and the robustness of the method. Keywords: augmented reality; camera pose estimation; convolution neural network; MobileNet-SSD
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.