PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Yu, Wenwen; Lu, Ning; Qi, Xianbiao; Gong, Ping; Xiao, Rong

doi:10.1109/icpr48806.2021.9412927

Cited by 99 publications

(40 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We believe that this work and the work of others [5,23,35,36] demonstrates the value of combining structured models with deep learning. We also believe it demonstrates the value of dynamically constructing deep networks based on a dynamic program.…”

Section: Discussionmentioning

confidence: 60%

“…[35] uses the pretraining of BERT and integrates image features to produce contextualized embeddings for bounding boxes. [36] uses graph convolutions to learn edge weights between every pair of nodes, with a Bidirectional-LSTM and CRF as decoder. These three prior works [5,35,36] predict the class of bounding boxes sequentially, then post-process to group them into records.…”

Section: -D Parsing Has Been Applied To Imagesmentioning

confidence: 99%

“…[36] uses graph convolutions to learn edge weights between every pair of nodes, with a Bidirectional-LSTM and CRF as decoder. These three prior works [5,35,36] predict the class of bounding boxes sequentially, then post-process to group them into records. This works well for flat representations such as the SROIE dataset [17] but we are not aware of their application to hierarchical structures in more complex documents.…”

Section: -D Parsing Has Been Applied To Imagesmentioning

confidence: 99%

See 2 more Smart Citations

DeepCPCFG: Deep Learning and Context Free Grammars for End-to-End Information Extraction

Chua¹,

Duffy²

2021

Document Analysis and Recognition – ICDAR 2021

View full text Add to dashboard Cite

We combine deep learning and Conditional Probabilistic Context Free Grammars (CPCFG) to create an end-to-end system for extracting structured information from complex documents. For each class of documents, we create a CPCFG that describes the structure of the information to be extracted. Conditional probabilities are modeled by deep neural networks. We use this grammar to parse 2-D documents to directly produce structured records containing the extracted information. This system is trained end-to-end with (Document, Record) pairs. We apply this approach to extract information from scanned invoices achieving state-of-the-art results.

show abstract

Section: Discussionmentioning

confidence: 60%

Section: -D Parsing Has Been Applied To Imagesmentioning

confidence: 99%

Section: -D Parsing Has Been Applied To Imagesmentioning

confidence: 99%

See 1 more Smart Citation

DeepCPCFG: Deep Learning and Context Free Grammars for End-to-End Information Extraction

Chua¹,

Duffy²

2021

Document Analysis and Recognition – ICDAR 2021

View full text Add to dashboard Cite

show abstract

“…Grid-based methods (Katti et al, 2018;Denk and Reisswig, 2019;Lin et al, 2021) were proposed for 2D document representation where text pixels were encoded using character or word embeddings and classified into specific field types, using a convolutional neural network. GNN-based approaches (Liu et al, 2019a;Yu et al, 2021;Tang et al, 2021) adopted multi-modal features of text segments as nodes to model the document graph, and used graph neural networks to propagate information between neighboring nodes to attain a richer representation.…”

Section: Related Workmentioning

confidence: 99%

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

Wang¹,

Jin²,

Ding³

2022

Preprint

View full text Add to dashboard Cite

Structured document understanding has attracted considerable attention and made significant progress recently, owing to its crucial role in intelligent document processing. However, most existing related models can only deal with the document data of specific language(s) (typically English) included in the pre-training collection, which is extremely limited. To address this issue, we propose a simple yet effective Language-independent Layout Transformer (LiLT) for structured document understanding. LiLT can be pretrained on the structured documents of a single language and then directly fine-tuned on other languages with the corresponding offthe-shelf monolingual/multilingual pre-trained textual models. Experimental results on eight languages have shown that LiLT can achieve competitive or even superior performance on diverse widely-used downstream benchmarks, which enables language-independent benefit from the pre-training of document layout structure. Code and model are publicly available at https://github.com/jpWang/LiLT.

show abstract

“…Noticing the rich visual information contained in VRDs, several methods [6,16,26,32] exploit 2D layout information to provides complementation for textual content. Besides, for further improvement, mainstream researches [2,21,24,30,38,48,50] usually employ a shallow fusion of text, image, and layout to capture contextual dependencies. Recently, several pre-training models [28,45,46] have been proposed for joint learning the deep fusion of cross-modality on large-scale data and outperform counterparts on document understanding.…”

Section: Introductionmentioning

confidence: 99%

StrucTexT: Structured Text Understanding with Multi-Modal Transformers

Qian

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Structured text understanding on Visually Rich Documents (VRDs) is a crucial part of Document Intelligence. Due to the complexity of content and layout in VRDs, structured text understanding has been a challenging task. Most existing studies decoupled this problem into two sub-tasks: entity labeling and entity linking, which require an entire understanding of the context of documents at both token and segment levels. However, little work has been concerned with the solutions that efficiently extract the structured data from different levels. This paper proposes a unified framework named StrucTexT, which is flexible and effective for handling both sub-tasks. Specifically, based on the transformer, we introduce a segment-token aligned encoder to deal with the entity labeling and entity linking tasks at different levels of granularity. Moreover, we design a novel pre-training strategy with three self-supervised tasks to learn a richer representation. StrucTexT uses the existing Masked Visual Language Modeling task and the new Sentence Length Prediction and Paired Boxes Direction tasks to incorporate the multi-modal information across text, image, and layout. We evaluate our method for structured text understanding at segment-level and token-level and show it outperforms the state-of-the-art counterparts with significantly superior performance on the FUNSD, SROIE, and EPHOIE datasets.

show abstract

PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Cited by 99 publications

References 23 publications

DeepCPCFG: Deep Learning and Context Free Grammars for End-to-End Information Extraction

DeepCPCFG: Deep Learning and Context Free Grammars for End-to-End Information Extraction

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

StrucTexT: Structured Text Understanding with Multi-Modal Transformers

Contact Info

Product

Resources

About