Invoices serve as proof of purchase and contain important information, including the date, description, quantity, and the price of goods or services, as well as the terms of payment. Companies must process invoices quickly and accurately to maintain proper financial records. The key tasks in document image analysis is text extraction. The text extraction process includes detection, localization, segmentation and enhancement of the text from the given input image. It's a methodology through which any data from the daily-use printed bills and invoices can be extracted. Optical Character Recognition (OCR) technology is a system that provides a full alphanumeric recognition of printed or handwritten characters from images. Initially, OpenCV has been used to detect the bill or invoice from the image and filter out the unnecessary noise from the image. Then intermediate image is passed for further processing using Tesseract OCR engine, which is an optical character recognition engine.To address this complexity, various techniques have been developed, such as Optical Character Recognition (OCR) for digitizing paper invoices and natural language processing (NLP) techniques for extracting relevant information from the text. Neural networks are also frequently used for document classification tasks.