Detecting and recognizing text in natural scene videos and images has brought more attention to computer vision researchers due to applications like robotic navigation and traffic sign detection. In addition, Optical Character Recognition (OCR) technology is applied to detect and recognize text on the license plate. It will be used in various commercial applications such as finding stolen cars, calculating parking fees, invoicing tolls, or controlling access to safety zones and aids in detecting fraud and secure data transactions in the banking industry. Much effort is required when scene text videos are in low contrast and motion blur with arbitrary orientations. Presently, text detection and recognition approaches are limited to static images like horizontal or approximately horizontal text. Detecting and recognizing text in videos with data dynamicity is more challenging because of the presence of multiple blurs caused by defocusing, motion, illumination changes, arbitrarily shaped, and occlusion. Thus, we proposed a combined DeepEAST (Deep Efficient and Accurate Scene Text Detector) and Keras OCR model to overcome these challenges in the proffered DEFUSE (Deep Fused) work. This two-combined technique detects the text regions and then deciphers the result into a machine-readable format. The proposed method has experimented with three different video datasets such as ICDAR 2015, Road Text 1K, and own video Datasets. Our results proved to be more effective with precision, recall, and F1-Score.
Images and videos with text content are a direct source of information. Today, there is a high need for image and video data that can be intelligently analyzed. A growing number of researchers are focusing on text identification, making it a hot issue in machine vision research. Since this opens the way, several real-time-based applications such as text detection, localization, and tracking have become more prevalent in text analysis systems. To find out more about how text information may be extracted, have a look at our survey. This study presents a trustworthy dataset for text identification in images and videos at first. The second part of the article details the numerous text formats, both in images and video. Third, the process flow for extracting information from the text and the existing machine learning and deep learning techniques used to train the model was described. Fourth, explain assessment measures that are used to validate the model. Finally, it integrates the uses and difficulties of text extraction across a wide range of fields. Difficulties focus on the most frequent challenges faced in the actual world, such as capturing techniques, lightning, and environmental conditions. Images and videos have evolved into valuable sources of data. The text inside the images and video provides a massive quantity of facts and statistics. However, such data is not easy to access. This exploratory view provides easier and more accurate mathematical modeling and evaluation techniques to retrieve the text in image and video into an accessible form.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.