Proceedings of the 22nd Conference on Computational Natural Language Learning 2018
DOI: 10.18653/v1/k18-1034
|View full text |Cite
|
Sign up to set email alerts
|

Upcycle Your

Abstract: We propose a post-OCR text correction approach for digitising texts in Romanised Sanskrit. Owing to the lack of resources our approach uses OCR models trained for other languages written in Roman. Currently, there exists no dataset available for Romanised Sanskrit OCR. So, we bootstrap a dataset of 430 images, scanned in two different settings and their corresponding ground truth. For training, we synthetically generate training images for both the settings. We find that the use of copying mechanism (Gu et al.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(5 citation statements)
references
References 21 publications
0
5
0
Order By: Relevance
“…• COPY: This system is the base architecture with a copy mechanism as described in Section 5.2. The single-source variant of this model is used for OCR post-correction on Romanized Sanskrit in Krishna et al (2018).…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…• COPY: This system is the base architecture with a copy mechanism as described in Section 5.2. The single-source variant of this model is used for OCR post-correction on Romanized Sanskrit in Krishna et al (2018).…”
Section: Methodsmentioning
confidence: 99%
“…There has been little work on lower-resourced languages. Kolak and Resnik (2005) present a probabilistic edit distance based post-correction model applied to Cebuano and Igbo, and Krishna et al (2018) show improvements on Romanized Sanksrit OCR by adding a copy mechanism to a neural sequence-to-sequence model.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…ITN is a monotone sequence transduction task where the input and output sequences typically have considerable lexical overlap and generally follow monotonicity in their alignments (Schnober et al, 2016;Krishna et al, 2018). Here, we formulate the task in three different setups.…”
Section: Itn Modelsmentioning
confidence: 99%