Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval 2020
DOI: 10.1145/3397271.3401442
|View full text |Cite
|
Sign up to set email alerts
|

Robust Layout-aware IE for Visually Rich Documents with Pre-trained Language Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 39 publications
(10 citation statements)
references
References 17 publications
0
10
0
Order By: Relevance
“…In this section, we introduce the datasets for pretraining and evaluation. Two datasets, RVL-CDIP [38] and DocBank [21] are utilized for pre-training. We conduct evaluations on three downstream tasks: 1) Reading order detection task on ReadingBank [37], 2) Table structure recognition task on SciTSR [6], ICDAR-2013 [11] and ICDAR-2019 [10], in which we follow the Setup-B setting in [24] where input by image along with layouts and contents, 3) Key information extraction task on FUNSD [18] and CORD [29], in which we focus on their entity linking tasks that rely on analyzing pairwise relation between entities.…”
Section: Datasets and Evaluation Protocolmentioning
confidence: 99%
“…In this section, we introduce the datasets for pretraining and evaluation. Two datasets, RVL-CDIP [38] and DocBank [21] are utilized for pre-training. We conduct evaluations on three downstream tasks: 1) Reading order detection task on ReadingBank [37], 2) Table structure recognition task on SciTSR [6], ICDAR-2013 [11] and ICDAR-2019 [10], in which we follow the Setup-B setting in [24] where input by image along with layouts and contents, 3) Key information extraction task on FUNSD [18] and CORD [29], in which we focus on their entity linking tasks that rely on analyzing pairwise relation between entities.…”
Section: Datasets and Evaluation Protocolmentioning
confidence: 99%
“…However, traditional NER models organize text in one dimension depending on the reading order and are unsuitable for VRDs with complex layouts. Recent studies [29,38,42,45,46,48,50] have realized the significance of segment-level features and incorporate a segment embedding to attach extra higher semantics. Although those methods, such as PICK [48] and TRIE [50], construct contextual features involving the segment clues, they revert to token-level labeling with NER-based schemes.…”
Section: Related Workmentioning
confidence: 99%
“…While the image modality was introduced only at the finetuning stage in LayoutLM, later models [28,14,35] include visual descriptors from convolutional layers directly into the token representations used for pre-training. These recent works mainly focus on adding new pre-training objectives complementing MVLM to more effectively mix the text, layout and image modalities when learning the document representations, for example the topic-modeling and document shuffling tasks of [28], the Sequence Positional Relationship Classification (SPRC) objective [34], the text-image alignment and matching tasks leveraged in [35] and the 2D area-masking strategy from [14]. Moreover, [35,14] both modify the computation of the self-attention scores to better encompass the relative positional relationships among the tokens of the document.…”
Section: Related Work On Information Extraction (Ie)mentioning
confidence: 99%