Scene text recognition (STR) is an important bridge between images and text, attracting abundant research attention. While convolutional neural networks (CNNS) have achieved remarkable progress in this task, most of the existing works need an extra module (context modeling module) to help CNN to capture global dependencies to solve the inductive bias and strengthen the relationship between text features. Recently, the transformer has been proposed as a promising network for global context modeling by self-attention mechanism, but one of the main short-comings, when applied to recognition, is the efficiency. We propose a 1-D split to address the challenges of complexity and replace the CNN with the transformer encoder to reduce the need for a context modeling module. Furthermore, recent methods use a frozen initial embedding to guide the decoder to decode the features to text, leading to a loss of accuracy. We propose to use a learnable initial embedding learned from the transformer encoder to make it adaptive to different input images. Above all, we introduce a novel architecture for text recognition, named TRansformer-based text recognizer with Initial embedding Guidance (TRIG), composed of three stages (transformation, feature extraction, and prediction). Extensive experiments show that our approach can achieve state-of-the-art on text recognition benchmarks.
Beyond the underlaying unrealistic presumptions in the existing video deblurring datasets and algorithms which presume that a naturally blurred video is fully blurred. In this work, we define a more realistic video frames averaging-based data degradation model by referring to a naturally blurred video as a partially blurred frames sequence, and use it to build REBVIDS, as a novel video deblurring dataset to close the gap between naturally blurred and synthetically blurred video training data, and to address most shortcomings of the existing datasets. We also present DeblurNet, a two phases training-based deep learning model for video deblurring, it consists of two main sub-modules; a Frame Selection Module and a Frame Deblurring Module. Compared to the recent learning-based approaches, its sub-modules have simpler network structures, with smaller number of training parameters, are easier to train and with faster inference. As naturally blurred videos are only partially blurred, the Frame Selection Module is in charge of selecting the blurred frames in a video sequence and forwarding them to the Frame Deblurring Module input, the Frame Deblurring Module in its turn will get them restored and recombine them according to the original order in a newly restored sequence beside their initially sharp neighbor frames. Extensive experimental results on several benchmarks demonstrate that DeblurNet performs favorably against the state-of-the-art, both quantitatively and qualitatively. DeblurNet proves its ability to trade between speed, computational cost and restoration quality. Besides its ability to restore video blurred frames with necessary edges and details, benefiting from its small size and its video frames selection integrated mechanism, it can speed up the inference phase by over ten times compared to existing approaches. This project dataset and code will be released soon and will be accessible through: https://github.com/nahliabdelwahed/Speed-up-videodeblurring-INDEX TERMS Video deblurring, image deblurring, video frames classification, inference run-time, deep learning, two stages training, CNN, GANs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.