Today's general-purpose deep convolutional neural networks (CNN) for image classification and object detection are trained offline on large static datasets. Some applications, however, will require training in real-time on live video streams with a human-in-the-loop. We refer to this class of problem as Time-ordered Online Training (ToOT)-these problems will require a consideration of not only the quantity of incoming training data, but the human effort required to tag and use it. In this paper, we define training benefit as a metric to measure the effectiveness of a sequence in using each user interaction. We demonstrate and evaluate a system tailored to performing ToOT in the field, capable of training an image classifier on a live video stream through minimal input from a human operator. We show that by exploiting the time-ordered nature of the video stream through optical flow-based object tracking, we can increase the effectiveness of human actions by about 8 times.
State-of-the-art camera-based deep learning methods for inventory monitoring tend to fail to generalize across different domains due to the high variance of scene settings. Large amounts of human labor are required to label and parameterize the models, making a real-world deployment impractical. In a third-party warehouse setting, supervised learning approaches are either too costly and/or inaccurate to deploy due to the need for human labor to address the diverse set of environmental factors (i.e, lighting conditions, product motion, deployment limitations).We introduce PIWIMS, a realistic synthetic dataset generation technique that combines the physical constraints of the scene in real-world deployments, drastically reducing the need for human labeling. In contrast to other generative techniques, where the generative parameters are learned from a large sample of available data, this compositive approach defines the parameters based on physical characteristics of the particular task, which requires minimal human annotation. We demonstrate PIWIMS performance in a 4-month real operating warehouse deployment and show that with only 32 manually labeled images per object, PIWIMS can achieve an accuracy of up to 87% in inventory tracking, which is a 28% increase when compared to traditional data augmentation techniques and 31% error reduction when compared to the third-party warehouse industry average. Furthermore, we demonstrate the ability of PI-WIMS to generalize across different camera angles and positions by achieving an accuracy of 85% in inventory tracking while varying the position and angle of the camera.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.