2022
DOI: 10.48550/arxiv.2202.13669
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding

Abstract: Structured document understanding has attracted considerable attention and made significant progress recently, owing to its crucial role in intelligent document processing. However, most existing related models can only deal with the document data of specific language(s) (typically English) included in the pre-training collection, which is extremely limited. To address this issue, we propose a simple yet effective Language-independent Layout Transformer (LiLT) for structured document understanding. LiLT can be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
4
0
2

Year Published

2023
2023
2023
2023

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 20 publications
0
4
0
2
Order By: Relevance
“…LiLT (Wang et al, 2022) is a multimodal model which takes both text and bounding boxes as input. The entire framework represents a parallel dualstream Transformer that concurrently processes two streams of information: one for text and the other for layout.…”
Section: Bibliography Detectormentioning
confidence: 99%
See 1 more Smart Citation
“…LiLT (Wang et al, 2022) is a multimodal model which takes both text and bounding boxes as input. The entire framework represents a parallel dualstream Transformer that concurrently processes two streams of information: one for text and the other for layout.…”
Section: Bibliography Detectormentioning
confidence: 99%
“…However, the license of the LayoutLMv3 prohibits it from being used in industry. A good alternative for industrial use cases instead, is the Language-independent Layout Transformer (LiLT), a multimodal model, which overcomes the language barrier and decouples and learns the layout knowledge from the monolingual structured documents before generalizing it to the multilingual (Wang et al, 2022).…”
Section: Introductionmentioning
confidence: 99%
“…Then the concatenated feature vectors are fed into a multi-modal transformer encoder-decoder to generate the bounding boxes, with a [CLS] special token prepended. In order to fully exploit the bounding box information, we use a layout enhanced Roberta model (Wang, Jin, and Ding 2022) instead of the vallina Roberta, which can output the original language hidden states and layout hidden states separately.…”
Section: Answer Location Modulementioning
confidence: 99%
“…Much of today's Information Extraction (IE) is done using probability-based token-classification models such as BERT (Devlin et al, 2018), RoBERTa (Liu et al, 2019), LayoutLM (Xu et al, 2020b,a;Huang et al, 2022) or LiLT (Wang et al, 2022). These models aim for the best results by increasingly stacking large amounts of parameters, which comes at the cost of increased computational requirements and training complexity.…”
Section: Introductionmentioning
confidence: 99%