Aurélie Joseph scite author profile

The present paper is focused on information extraction from key fields of invoices using two different methods based on sequence labeling. Invoices are semi-structured documents in which data can be located based on the context. Common information extraction systems are model-driven, using heuristics and lists of trigger words curated by domain experts. Their performances are generally high on documents they have been trained for but processing new templates often requires new manual annotations, which is tedious and time-consuming to produce. Recent works on deep learning applied to business documents claimed a gain in terms of time and performance. While these systems do not need manual curation, they nevertheless require a large amount of data to achieve good results. In this paper, we present a series of experiments using neural networks approaches to study the trade-off between data requirements and performance in the extraction of information from key fields of invoices (such as dates, document numbers, types, amounts...). The main contribution of this paper is a system that achieves competitive results using a small amount of data compared to the state-of-the-art systems that need to be trained on large datasets, that are costly and impractical to produce in real-world applications.

show abstract

Machine Learning vs Deterministic Rule-Based System for Document Stream Segmentation

Hamdi

Voerman

Coustaty

et al. 2017

View full text Add to dashboard Cite

InDUS: Incremental Document Understanding System Focus on Document Classification

d'Andecy¹,

Joseph²,

Ogier³

2018

View full text Add to dashboard Cite

Feature Selection for Document Flow Segmentation

Hamdi

Coustaty

Joseph

et al. 2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Aurélie Joseph

Generating consensus fuzzy cognitive maps

Information Extraction from Invoices

Machine Learning vs Deterministic Rule-Based System for Document Stream Segmentation

InDUS: Incremental Document Understanding System Focus on Document Classification

Feature Selection for Document Flow Segmentation

Contact Info

Product

Resources

About