2022
DOI: 10.14429/djlit.42.4.17742
|View full text |Cite
|
Sign up to set email alerts
|

Optical Character Recognition for Printed Tamizhi Documents using Deep Neural Networks

Abstract: Tamizhi (Tamil-Brahmi) script is one of the oldest scripts in India from which most of the modern Indian scripts are evolved. The ancient historical documents are generally preserved as digitised texts using Optical Character Recognition (OCR) technique. But the development of OCR for Tamizhi documents is highly challenging as many characters have similar shapes and structures with very small variations. In specific, for Tamizhi script it is very difficult to build an OCR as many characters are combined… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 0 publications
0
2
0
Order By: Relevance
“…Their use, which is not limited to any object, makes them a popular part of computer vision. OCR is often discussed in research on the detection of real objects in the form of objects or text, as is the case with automatic detection of license plate numbers [30], [31], object detection in image documents [32], handwriting detection [33], identification of ancient manuscripts [34] and many more.…”
Section: Related Workmentioning
confidence: 99%
“…Their use, which is not limited to any object, makes them a popular part of computer vision. OCR is often discussed in research on the detection of real objects in the form of objects or text, as is the case with automatic detection of license plate numbers [30], [31], object detection in image documents [32], handwriting detection [33], identification of ancient manuscripts [34] and many more.…”
Section: Related Workmentioning
confidence: 99%
“…In addition, in [13], Sundanese writing inscribed on palm leaves was recognised using a three-layer CNN with a 73% recognition accuracy. Next, in [14], was employed for recognition, and Tesseract training was performed using a deep neural network architect. Then, CNN long-short-term memory (LSTM) networks were configured and trained for the language model of the Tamizhi script, leading to an OCR accuracy of 91.21%.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Around 198 contemporary scripts in Central and South Asia are believed to have evolved from the Brahmi script. [2] In the Modern Tamil script stage (16th century CE to present), the Tamil characters got its current shape which are used in contemporary Tamil literature and everyday communication.…”
mentioning
confidence: 99%