Customization is an increasing trend in fashion product industry to reflect individual lifestyles. Previous studies have examined the idea of virtual footwear try-on in augmented reality (AR) using a depth camera. However, the depth camera restricts the deployment of this technology in practice. This research proposes to estimate the 6-DoF pose of a human foot from a color image using deep learning models to solve the problem. We construct a training dataset consisting of synthetic and real foot images that are automatically annotated. Three convolutional neural network models (DOPE, DOPE2, and YOLO6d) are trained with the dataset to predict the foot pose in real-time. The model performances are evaluated using metrics for accuracy, computational efficiency, and training time. A prototyping system implementing the best model demonstrates the feasibility of virtual footwear try-on using a RGB camera. Test results also indicate the necessity of real training data to bridge the reality gap in estimating the human foot pose.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.