Transfer Learning for Scene Text Recognition in Indian Languages

Gunna, Sanjana; Saluja, Rohit; Jawahar, C. V.

doi:10.1007/978-3-030-86198-8_14

Cited by 8 publications

(16 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another synthetic dataset widely in use for English language is SynthText [13]. On the Indian language side, there have been some works like [12,23] which use synthetic datasets for a total of 6 Indian Languages. However, like real dataset scenario, a comprehensive synthetic dataset for all 12 major Indian languages is absent.…”

Section: Related Workmentioning

confidence: 99%

“…The synthetic dataset proposed has more than 3 Million word images per language. For benchmarking STR performance, we have followed the same procedure as [12], using 2 Million word images for training the network and 0.5 Million for validation and testing. According to the Census 2011 report on Indian languages [10], India has 22 major or scheduled languages with a significant volume of writing.…”

Section: Related Workmentioning

confidence: 99%

“…According to the 2011 census report [10], the included languages cover 98% of the subcontinent's spoken language. IndicSTR12 is an extension of IIIT-ILST [23] and [12], which cover Telugu, Malayalam, Hindi, Gujarati, and Tamil, respectively. There has been no addition of images for any of the mentioned languages, except for Gujarati, which had less than 1000 word-images.…”

Section: Related Workmentioning

confidence: 99%

“…Another model called STAR-Net [20], extracts more robust features from word-images and performs an initial distortion correction, is also used for benchmarking. This model has been taken up to maintain consistency with the previous works on Indic STR [12,23]. PARSeq PARSeq is a transformer-based model which is trained using Permutation Language Modeling (PLM).…”

Section: Modelsmentioning

confidence: 99%

“…We propose a real dataset (Fig. 1 (left)) for 12 Major Indian Languages, namely -Assamese, Bengali, Odia, Marathi, Hindi, Kannada, Urdu, Telugu, Malayalam, Tamil, Gujarati and Punjabi -wherein Malayalam, Telugu, Hindi, and Tamil wordimages have been taken from [23] and [12]. Since the number of word instances proposed by [12] for Gujarati was less than 1000 work image instances, we augment the proposed Gujarati instances to achieve numbers comparable to other languages in the proposed dataset.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

IndicSTR12: A Dataset for Indic Scene Text Recognition

Lunia,

Mondal,

Jawahar

2023

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

The importance of Scene Text Recognition (STR) in today's increasingly digital world cannot be overstated. Given the significance of STR, dataintensive deep learning approaches that auto-learn feature mappings have primarily driven the development of STR solutions. Several benchmark datasets and substantial work on deep learning models are available for Latin languages to meet this need. On more complex, syntactically and semantically, Indian languages spoken and read by 1.3 billion people, there is less work and datasets available. This paper aims to address the Indian space's lack of a comprehensive dataset by proposing the largest and most comprehensive real dataset -Indic-STR12 -and benchmarking STR performance on 12 major Indian languages 1 . A few works have addressed the same issue, but to the best of our knowledge, they focused on a small number of Indian languages. The size and complexity of the proposed dataset are comparable to those of existing Latin contemporaries, while its multilingualism will catalyse the development of robust text detection and recognition models. It was created specifically for a group of related languages with different scripts. The dataset contains over 27000 word-images gathered from various natural scenes, with over 1000 word-images for each language. Unlike previous datasets, the images cover a broader range of realistic conditions, including blur, illumination changes, occlusion, non-iconic texts, low resolution, perspective text etc. Along with the new dataset, we provide a high-performing baseline on three models: PARSeq (Latin SOTA), CRNN, and STARNet.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%