2018 IEEE 13th International Conference on Industrial and Information Systems (ICIIS) 2018
DOI: 10.1109/iciinfs.2018.8721372
|View full text |Cite
|
Sign up to set email alerts
|

Document Segmentation and Language Translation Using Tesseract-OCR

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 21 publications
(6 citation statements)
references
References 3 publications
0
5
0
1
Order By: Relevance
“…the json responce the ocr image function processimage().json . Aim of Tesseract was to recognizes white on black which help to analysis and recognizes a particular character [10]. The data cleaning process get more complex when data come from some heterogeneous source, these problem has been solved by data cleaning and data transformation, data cleaning modules such as removing columns with less data removing unnecessary rows identifying some valuing and feeling it is done by Python code with the help of machine learning algorithm , most of the data got missing after cleaning then to overcome this used pyspellchecker , Text-blob and Auto-correct [1], this all are open source packages that allow to correct and check spelling , meanwhile by using Text-blob allows us to use custom database it provide simple api for divining into natural language processing(NLP) job…”
Section: A Image Scanningmentioning
confidence: 99%
“…the json responce the ocr image function processimage().json . Aim of Tesseract was to recognizes white on black which help to analysis and recognizes a particular character [10]. The data cleaning process get more complex when data come from some heterogeneous source, these problem has been solved by data cleaning and data transformation, data cleaning modules such as removing columns with less data removing unnecessary rows identifying some valuing and feeling it is done by Python code with the help of machine learning algorithm , most of the data got missing after cleaning then to overcome this used pyspellchecker , Text-blob and Auto-correct [1], this all are open source packages that allow to correct and check spelling , meanwhile by using Text-blob allows us to use custom database it provide simple api for divining into natural language processing(NLP) job…”
Section: A Image Scanningmentioning
confidence: 99%
“…Setelah gambar kartu nama selesai melalui proses preprocessing, maka gambar akan dilanjutkan ke proses ocr memanfaatkan tesseract [9]. Proses ocr pada Penelitian ini akan memanfaatkan library pytesseract dan mengikuti pendekatan pada [10] dan [11]. Sebelum melalui proses ocr, gambar yang sudah mengalami preprocessing akan dibagibagi sesuai kontur yang ada seperti pada Gambar 27.…”
Section: Pengenalan Karakterunclassified
“…Tesseract is a tool recognizes and reads the text present in images as in Fig. (3) where the algorithm is applied to the image of the energy meter to extract the meter reading where this process is done in four stages, in the first stage converts the image to grayscale so as to reduce the image details and then in the second stage of the image is converted to a binary image to reduce and remove the background details and in the third stage, it is the corrosion process to reduce and eliminate noise in order to isolate the number from the background in order to facilitate the process of distinguishing numbers [9][10].…”
Section: Raspberry Pi3 Module Bmentioning
confidence: 99%