Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining 2020
DOI: 10.1145/3394486.3403172
|View full text |Cite
|
Sign up to set email alerts
|

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Abstract: Automatic information extraction from identity documents is a fundamental task in digital processes such as onboarding, requesting products, identity validation, among others. The information extraction process consists of identifying, locating, classifying and recognizing text of the corresponding key fields that an identity document contains. In the case of identity documents, key fields are: names, last names, document number, dates, among others.The information extraction problem has been traditionally sol… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
394
0
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 489 publications
(397 citation statements)
references
References 65 publications
1
394
0
2
Order By: Relevance
“…The second promising direction is the multimodal processing of the graphical objects. In the case of graphical page object detection, multimodal processing, in the simplest form, is the processing of image information and text information together [62,63]. An example of such a case is when a figure is categorized as a table and vice versa; the text information can be beneficial.…”
Section: Future Workmentioning
confidence: 99%
“…The second promising direction is the multimodal processing of the graphical objects. In the case of graphical page object detection, multimodal processing, in the simplest form, is the processing of image information and text information together [62,63]. An example of such a case is when a figure is categorized as a table and vice versa; the text information can be beneficial.…”
Section: Future Workmentioning
confidence: 99%
“…Extracting pre-defined and commonly occurring named entities from invoices like documents(using text and box coordinates) has been the main focus for some prior works (Katti et al, 2018;Liu et al, 2019;Denk and Reisswig, 2019;Majumder et al, 2020). Text and document layouts have been used for learning BERT (Devlin et al, 2019) like representations through pre-training and then combined with image features for information extraction from documents (Xu et al, 2020;Garncarek et al, 2020). However, our work focuses on extracting a much more generic, diverse, complex, dense, and hierarchical document structure from Forms.…”
Section: Related Workmentioning
confidence: 99%
“…LayoutLM (Xu et al, 2019) is a BERT-like transformer model modified to generate layoutaware contextualized word embeddings. In place of BERT's single positional embedding, LayoutLM adds positional embeddings for the x-and ycoordinates of a bounding box around the token.…”
Section: Systemsmentioning
confidence: 99%
“…We therefore expect that a hybrid document representation that combines layout and text information should outperform a text-only representation when clustering documents by type. LayoutLM (Xu et al, 2019) is such a hybrid system and achieves state-of-theart performance for document-type classification, outperforming text-only baselines. We therefore hypothesized that LayoutLM would also outperform these baselines for document-type clustering.…”
Section: Introductionmentioning
confidence: 99%