Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
In this work, we introduce WAFFNet, an attention-centric feature fusion architecture tailored for word-level multi-lingual scene text script identification. Motivated by the limitations of traditional approaches that rely exclusively on feature-based methods or deep learning strategies, our approach amalgamates statistical and deep features to bridge the gap. At the core of WAFFNet, we utilized the merits of Local Binary Pattern —a prominent descriptor capturing low-level texture features with high-dimensional, semantically-rich convolutional features. This fusion is judiciously augmented by a spatial attention mechanism, ensuring targeted emphasis on semantically critical regions of the input image. To address the class imbalance problem in multi-class classification scenarios, we employed a weighted objective function. This not only regularizes the learning process but also addresses the class imbalance problem. The architectural integrity of WAFFNet is preserved through an end-to-end training paradigm, leveraging transfer learning to expedite convergence and optimize performance metrics. Considering the under-representation of regional Indian languages in current datasets, we meticulously curated IIITG-STLI2023, a comprehensive dataset encapsulating English alongside six under-represented Indian languages: Hindi, Kannada, Malayalam, Telugu, Bengali, and Manipuri. Rigorous evaluation of the IIITG-STLI2023, as well as the established MLe2e and SIW-13 datasets, underscores WAFFNet’s supremacy over both traditional feature-engineering approaches as well as state-of-the-art deep learning frameworks. Thus, the proposed WAFFNet framework offers a robust and effective solution for language identification in scene text images.
In this work, we introduce WAFFNet, an attention-centric feature fusion architecture tailored for word-level multi-lingual scene text script identification. Motivated by the limitations of traditional approaches that rely exclusively on feature-based methods or deep learning strategies, our approach amalgamates statistical and deep features to bridge the gap. At the core of WAFFNet, we utilized the merits of Local Binary Pattern —a prominent descriptor capturing low-level texture features with high-dimensional, semantically-rich convolutional features. This fusion is judiciously augmented by a spatial attention mechanism, ensuring targeted emphasis on semantically critical regions of the input image. To address the class imbalance problem in multi-class classification scenarios, we employed a weighted objective function. This not only regularizes the learning process but also addresses the class imbalance problem. The architectural integrity of WAFFNet is preserved through an end-to-end training paradigm, leveraging transfer learning to expedite convergence and optimize performance metrics. Considering the under-representation of regional Indian languages in current datasets, we meticulously curated IIITG-STLI2023, a comprehensive dataset encapsulating English alongside six under-represented Indian languages: Hindi, Kannada, Malayalam, Telugu, Bengali, and Manipuri. Rigorous evaluation of the IIITG-STLI2023, as well as the established MLe2e and SIW-13 datasets, underscores WAFFNet’s supremacy over both traditional feature-engineering approaches as well as state-of-the-art deep learning frameworks. Thus, the proposed WAFFNet framework offers a robust and effective solution for language identification in scene text images.
The method for document image classification presented in this paper mainly focuses on six different Malayalam palm leaf manuscripts categories. The proposed approach consists of three phases: dataset analysis, building a bag of words repository followed by recognition and classification using a voting approach. The palm leaf manuscripts are initially subject to pre-processing and subjective analysis techniques to create a bag of words repository during the dataset analysis phase. Next, the textual components from the manuscripts are extracted for recognition using Tesseract 4 OCR with default and self-adapted training sets and a deep-learning algorithm. The Bag of Words approach is used in the third phase to categorize the palm leaf manuscripts based on textual components recognized by OCR using a voting process. Experimental analysis was done to analyze the proposed approach with and without the voting techniques, varying the size of the Bag of Words with default/self-adapted training datasets using Tesseract OCR and a deep learning model. Experimental analysis proves that the proposed approach works equally well with/ without voting with a bag of words technique using Tesseract OCR. It is noticed that, for document classification, an overall accuracy of 83% without voting and 84.5% with voting is achieved with an F-score of 0.90 in both cases using Teserract OCR. Overall, the proposed approach proves to be high generalizable based on trial wise experiments with Bag of Words, offering a reliable way for classifying deteriorated Malayalam handwritten palm manuscripts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.