CMOS sensors employ a row-wise acquisition mechanism while imaging a scene, which can result in undesired motion artifacts known as rolling shutter (RS) distortions in the captured image. Existing single image RS rectification methods attempt to account for these distortions by using either algorithms tailored for a specific class of scenes that warrants information of intrinsic camera parameters or a learning-based framework with known ground truth motion parameters. In this paper, we propose an end-to-end deep neural network for the challenging task of single image RS rectification. Our network consists of a motion block, a trajectory module, a row block, an RS rectification module, and an RS regeneration module (which is used only during training). The motion block predicts the camera pose for every row of the input RS distorted image, while the trajectory module fits estimated motion parameters to a third-order polynomial. The row block predicts the camera motion that must be associated with every pixel in the target, i.e., RS rectified image. Finally, the RS rectification module uses motion trajectory and the output of a row block to warp the input RS image to arrive at a distortion-free image. For faster convergence during training, we additionally use an RS regeneration module that compares the input RS image with the ground truth image distorted by estimated motion parameters. The end-to-end formulation in our model does not constrain the estimated motion to ground truth motion parameters, thereby successfully rectifying the RS images with complex real-life camera motion. Experiments on synthetic and real datasets reveal that our network outperforms prior art both qualitatively and quantitatively.
Production time of stroke gestures is a fundamental measure of user performance with Graphical User Interfaces. However, production time represents an overall quantification of the user's gesture articulation process and therefore provides an incomplete picture of such process. Moreover, previous approaches assumed stroke gestures as synchronous point sequences, when most gesture-driven applications have to deal with asynchronous point sequences. Furthermore, deep generative models of human handwriting ignore the temporal information, thereby missing a key component of the user's gesture articulation process. To solve these issues, we introduce DITTO, a sequence-to-sequence deep learning model that estimates the velocity profile of any stroke gesture using spatial information only, providing thus a fine-grained estimation of the moment-bymoment behavior of the user's articulation performance. We show that this unique capability makes DITTO remarkably accurate while handling gestures of any type: unistrokes, multistrokes, and multitouch gestures. Our model, code, and associated web application are available as open source software.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.