Extraction of information from bill receipts using optical character recognition

Kumar, V. Likith; Kaware, Pratyush; Singh, Pradhuman; Sonkusare, Reena; Kumar, Siddhant

doi:10.1109/icosec49089.2020.9215246

Cited by 12 publications

(6 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Haraj and Raissouni produced an average of 95.77% charcater accuracy using tesseract and opencv library over 4 sample images in 2015 [17]. Those research [14,15,16,17] only used relatively small samples (less than 50 documents), while our study used more documents (8,562 documents in 6 Categories and two document structures). Previous research [14,15,16,17], which also employed the Tesseract library, only used string matching to measure the OCR.…”

Section: Related Workmentioning

confidence: 84%

“…Patel and friends produced 70% accuracy using 20 sample images in 2012 [14]. Kumar and friends produced 97% accuracy for small scanned bill documents and 83% accuracy for small scanned bill documents using Tesseract OCR on 25 scanned bills in 2020 [15]. Akinbade and friends produced 81.9% character accuracy and 69.7% word accuracy on 11 sample images in 2020 [16].…”

Section: Related Workmentioning

confidence: 99%

“…Those research [14,15,16,17] only used relatively small samples (less than 50 documents), while our study used more documents (8,562 documents in 6 Categories and two document structures). Previous research [14,15,16,17], which also employed the Tesseract library, only used string matching to measure the OCR. On the other hand, our study used four measurements, i.e., conversion time, NER time, string match accuracy as precision, and the number of entities acquired as recall.…”

Section: Related Workmentioning

confidence: 99%

“…An offline desktop-based application called Foxit, an online-based application called PDF2GO, and an open-source OCR library called Tesseract were used to convert all documents. Patel et al in 2012 [14] produce 70% accuracy, Kumar et al in 2020 [15] produce 97% accuracy and, Akinbade [16] produce 81.9% accuracy using the Tesseract library on scanned documents.…”

Section: B Ocr Engines Preprocessingmentioning

confidence: 99%

See 3 more Smart Citations

Optical Character Recognition Engines Performance Comparison in Information Extraction

Ramdhani

Budi²,

Purwandari³

2021

IJACSA

View full text Add to dashboard Cite

Named Entity Recognition (NER) is often used to acquire important information from text documents as a part of the Information Extraction (IE) process. However, the text documents quality affects the accuracy of the data obtained, especially for text documents acquired involving the Optical Character Recognition (OCR) process, which never reached 100% accuracy. This research tried to examine which OCR engine with the highest performance for IE using NER by comparing three OCR engines (Foxit, PDF2GO, Tesseract) over 8,562 government human resources documents within six document categories, two document structures, and four measurements. Several essential entities such as name, employee ID, document number, document publishing date, employee rank, and family member's name were trying to be extracted automatically from the documents. NER processes were done using Python programming language, and the preprocessing tasks were done separately for Foxit, PDF2GO, and Tesseract. In summary, each OCR engine has its drawbacks and benefit, such as Tesseract has better NER extraction and conversion time with better accuracy but lack in the number of entities acquired.

show abstract

Section: Related Workmentioning

confidence: 84%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: B Ocr Engines Preprocessingmentioning

confidence: 99%

See 2 more Smart Citations

Optical Character Recognition Engines Performance Comparison in Information Extraction

Ramdhani

Budi²,

Purwandari³

2021

IJACSA

View full text Add to dashboard Cite

show abstract

“…According to most of the available research papers, pre-processing images using some tool is the first step in extracting information. When digitizing old documents or even handwritten notes, the text quality present in the document is crucial in determining the extracted text accuracy [11]. The challenge in this step is identifying the most suitable pre-processing algorithms to improve the quality of the type of images we are dealing with and the different combinations in which these algorithms are used.…”

Section: B Text Extraction and Analysi Of Billsmentioning

confidence: 99%

Money Empire: Intelligent Assistant for Personal Finance Management

Balathas¹

2022

IJRASET

View full text Add to dashboard Cite

Poverty, and debt are burning issues globally and especially in third-world nations such as Sri Lanka. The major contributing factor to poverty is poor personal financial planning resulting in being unable to make ends meet, forcing citizens to live mediocre lifestyles or even resort to debt or crime. Personal financial planning is one of the most crucial practices to overcome this situation namely: setting financial goals, tracking expenses, and meeting set budgets. In Sri Lanka, very few people keep track of their expenses, and a majority of those who do, follow manual methods which are error-prone and easily abandoned due to increased stress and complexity. This paper proposes a solution to automate the flow of personal finance management, requiring minimal effort from the user. The application detects SMS alerts received regarding Bank transactions and extracts transaction details. Users can easily upload their expense bills from which expense details will be fetched and processed. The app also allows users to split their bills with friends or family and track and settle their incoming and outgoing debts. Finally, the user can visualize their income and expense analysis at a glance and make better decisions to meet their financial objectives and all these features in one single application.

show abstract