2022
DOI: 10.48550/arxiv.2205.02543
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

OCR Synthetic Benchmark Dataset for Indic Languages

Abstract: We present the largest publicly available synthetic OCR benchmark dataset for Indic languages. The collection contains a total of 90k images and their ground truth for 23 Indic languages. OCR model validation in Indic languages require a good amount of diverse data to be processed in order to create a robust and reliable model. Generating such a huge amount of data would be difficult otherwise but with synthetic data, it becomes far easier. It can be of great importance to fields like Computer Vision or Image … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 2 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?