2020 25th International Conference on Pattern Recognition (ICPR) 2021
DOI: 10.1109/icpr48806.2021.9412927
|View full text |Cite
|
Sign up to set email alerts
|

PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
33
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 99 publications
(40 citation statements)
references
References 23 publications
0
33
0
Order By: Relevance
“…We believe that this work and the work of others [5,23,35,36] demonstrates the value of combining structured models with deep learning. We also believe it demonstrates the value of dynamically constructing deep networks based on a dynamic program.…”
Section: Discussionmentioning
confidence: 60%
See 2 more Smart Citations
“…We believe that this work and the work of others [5,23,35,36] demonstrates the value of combining structured models with deep learning. We also believe it demonstrates the value of dynamically constructing deep networks based on a dynamic program.…”
Section: Discussionmentioning
confidence: 60%
“…[35] uses the pretraining of BERT and integrates image features to produce contextualized embeddings for bounding boxes. [36] uses graph convolutions to learn edge weights between every pair of nodes, with a Bidirectional-LSTM and CRF as decoder. These three prior works [5,35,36] predict the class of bounding boxes sequentially, then post-process to group them into records.…”
Section: -D Parsing Has Been Applied To Imagesmentioning
confidence: 99%
See 1 more Smart Citation
“…Grid-based methods (Katti et al, 2018;Denk and Reisswig, 2019;Lin et al, 2021) were proposed for 2D document representation where text pixels were encoded using character or word embeddings and classified into specific field types, using a convolutional neural network. GNN-based approaches (Liu et al, 2019a;Yu et al, 2021;Tang et al, 2021) adopted multi-modal features of text segments as nodes to model the document graph, and used graph neural networks to propagate information between neighboring nodes to attain a richer representation.…”
Section: Related Workmentioning
confidence: 99%
“…Noticing the rich visual information contained in VRDs, several methods [6,16,26,32] exploit 2D layout information to provides complementation for textual content. Besides, for further improvement, mainstream researches [2,21,24,30,38,48,50] usually employ a shallow fusion of text, image, and layout to capture contextual dependencies. Recently, several pre-training models [28,45,46] have been proposed for joint learning the deep fusion of cross-modality on large-scale data and outperform counterparts on document understanding.…”
Section: Introductionmentioning
confidence: 99%