EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification

Naosekpam, Veronica; Islam, Mushtaq; Chourasia, Amul; Sahu, Nilkanta

doi:10.1007/978-3-031-44237-7_7

Cited by 2 publications

(1 citation statement)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Videos primarily feature two text types: scene text 2–6 and captions 7 . Billboards, trucks, t‐shirts, streets, and buildings commonly display scene text, whereas captions, comprising subtitles and overlay text, undergo manual video edits.…”

Section: Introductionmentioning

confidence: 99%

Video text rediscovery: Predicting and tracking text across complex scenes

Naosekpam,

Sahu

2024

Computational Intelligence

Self Cite

View full text Add to dashboard Cite

Dynamic texts in scene videos provide valuable insights and semantic cues crucial for video applications. However, the movement of this text presents unique challenges, such as blur, shifts, and blockages. While efficient in tracking text, state‐of‐the‐art systems often need help when text becomes obscured or complicated scenes. This study introduces a novel method for detecting and tracking video text, specifically designed to predict the location of obscured or occluded text in subsequent frames using a tracking‐by‐detection paradigm. Our approach begins with a primary detector to identify text within individual frames, thus enhancing tracking accuracy. Using the Kalman filter, Munkres algorithm, and deep visual features, we establish connections between text instances across frames. Our technique works on the concept that when text goes missing in a frame due to obstructions, we use its previous speed and location to predict its next position. Experiments conducted on the ICDAR2013 Video and ICDAR2015 Video datasets confirm our method's efficacy, matching or surpassing established methods in performance.

show abstract

Section: Introductionmentioning

confidence: 99%

Video text rediscovery: Predicting and tracking text across complex scenes

Naosekpam,

Sahu

2024

Computational Intelligence

Self Cite

View full text Add to dashboard Cite

show abstract

A Hybrid Scene Text Script Identification Network for Regional Indian Languages

Naosekpam,

Sahu

2024

ACM Trans. Asian Low-Resour. Lang. Inf. Process.

View full text Add to dashboard Cite

In this work, we introduce WAFFNet, an attention-centric feature fusion architecture tailored for word-level multi-lingual scene text script identification. Motivated by the limitations of traditional approaches that rely exclusively on feature-based methods or deep learning strategies, our approach amalgamates statistical and deep features to bridge the gap. At the core of WAFFNet, we utilized the merits of Local Binary Pattern —a prominent descriptor capturing low-level texture features with high-dimensional, semantically-rich convolutional features. This fusion is judiciously augmented by a spatial attention mechanism, ensuring targeted emphasis on semantically critical regions of the input image. To address the class imbalance problem in multi-class classification scenarios, we employed a weighted objective function. This not only regularizes the learning process but also addresses the class imbalance problem. The architectural integrity of WAFFNet is preserved through an end-to-end training paradigm, leveraging transfer learning to expedite convergence and optimize performance metrics. Considering the under-representation of regional Indian languages in current datasets, we meticulously curated IIITG-STLI2023, a comprehensive dataset encapsulating English alongside six under-represented Indian languages: Hindi, Kannada, Malayalam, Telugu, Bengali, and Manipuri. Rigorous evaluation of the IIITG-STLI2023, as well as the established MLe2e and SIW-13 datasets, underscores WAFFNet’s supremacy over both traditional feature-engineering approaches as well as state-of-the-art deep learning frameworks. Thus, the proposed WAFFNet framework offers a robust and effective solution for language identification in scene text images.

show abstract

EMBiL: An English-Manipuri Bi-lingual Benchmark for Scene Text Detection and Language Identification

Cited by 2 publications

References 19 publications

Video text rediscovery: Predicting and tracking text across complex scenes

Video text rediscovery: Predicting and tracking text across complex scenes

A Hybrid Scene Text Script Identification Network for Regional Indian Languages

Contact Info

Product

Resources

About