Information Extraction System for Invoices and Receipts

Tan, QiuXing Michelle; Cao, Qi; Seow, Chee Kiat; Yau, Peter Chunyu

doi:10.1007/978-981-99-4752-2_7

Cited by 3 publications

(1 citation statement)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…QiuXing Michelle Tan et al [5] .The rapid expansion of document digitization, encompassing paper-based invoices and receipts, has underscored the need for accurate and efficient information processing methods. However, manual data extraction by humans has become impractical due to its laborious and time-consuming nature.…”

Section: Shreeshiv Patel Et Al [4]mentioning

confidence: 99%

Streamlining Invoice Management: Leveraging OCR and NLP for Efficiency

2024

IRJMETS

View full text Add to dashboard Cite

Invoices serve as proof of purchase and contain important information, including the date, description, quantity, and the price of goods or services, as well as the terms of payment. Companies must process invoices quickly and accurately to maintain proper financial records. The key tasks in document image analysis is text extraction. The text extraction process includes detection, localization, segmentation and enhancement of the text from the given input image. It's a methodology through which any data from the daily-use printed bills and invoices can be extracted. Optical Character Recognition (OCR) technology is a system that provides a full alphanumeric recognition of printed or handwritten characters from images. Initially, OpenCV has been used to detect the bill or invoice from the image and filter out the unnecessary noise from the image. Then intermediate image is passed for further processing using Tesseract OCR engine, which is an optical character recognition engine.To address this complexity, various techniques have been developed, such as Optical Character Recognition (OCR) for digitizing paper invoices and natural language processing (NLP) techniques for extracting relevant information from the text. Neural networks are also frequently used for document classification tasks.

show abstract

Section: Shreeshiv Patel Et Al [4]mentioning

confidence: 99%

Streamlining Invoice Management: Leveraging OCR and NLP for Efficiency

2024

IRJMETS

View full text Add to dashboard Cite

show abstract

Information Extraction Using RPA and Generative AI from Unstructured Documents: A Case of Invoices

Sowjanya,

Vijaya Chamundeeswari

2024

IFIP Advances in Information and Communication Technology

View full text Add to dashboard Cite

Configurable Customized Information Extraction and Processing Pipeline

Kim,

Lai,

Khan

et al. 2024

Int. J. Patt. Recogn. Artif. Intell.

View full text Add to dashboard Cite

Extracting information from scanned business documents, while a necessary commercial task, continues to be mostly done manually, requiring significant human effort. Current solutions for automated document information extraction still have limited capabilities in regards to user-required customizability and extraction of dataset-specific information, leaving the area as a very active field of research. In this paper, we propose modifications and improvements to our previously developed custom pipeline for extracting and tabulating key-value pairs from commercial invoice documents. Our design changes and additions adapt the pipeline to a wider variety of document types and use cases, primarily through the implementation of dataset-specific configuration files that promote customizability along with new technical modules that address both general and dataset-specific complexities. We compare our pipeline’s performance against current machine learning and commercial solutions on a real-world dataset, and demonstrate that it is able to extract a wider variety of fields while maintaining competitive or greater accuracies compared to the alternate solutions.

show abstract

Information Extraction System for Invoices and Receipts

Cited by 3 publications

References 14 publications

Streamlining Invoice Management: Leveraging OCR and NLP for Efficiency

Streamlining Invoice Management: Leveraging OCR and NLP for Efficiency

Information Extraction Using RPA and Generative AI from Unstructured Documents: A Case of Invoices

Configurable Customized Information Extraction and Processing Pipeline

Contact Info

Product

Resources

About