From Dataset Recycling to Multi-Property Extraction and Beyond

Dwojak, Tomasz; Pietruszka, Michał; Borchmann, Łukasz; Chłędowski, Jakub; Graliński, Filip

doi:10.18653/v1/2020.conll-1.52

Cited by 4 publications

(5 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Sequence labeling models can be trained in all cases where the token-level annotation is available or can be easily obtained. Limitations of this approach are strikingly visible on tasks framed in either key information extraction or property extraction paradigms [20,9]. Here, no annotated spans are available, and only property-value pairs are assigned to the document.…”

Section: Limitations Of Sequence Labelingmentioning

confidence: 99%

“…Finally, the property extraction paradigm does not assume the requested value appeared in the article in any form since it is sufficient for it to be inferable from the content, as in document classification or non-extractive question answering [9].…”

Section: Limitations Of Sequence Labelingmentioning

confidence: 99%

“…The QA program of unifying NLP frames all the problems as triplets of question, context and answer [30,40,27] or item, property name and answer [17]. Although this does not necessarily lead to the use of encoder-decoder models, several successful solutions relied on variants of Transformer architecture [54,35,9,45]. The T5 is a prominent example of large-scale Transformers achieving state-of-the-art results on varied NLP benchmarks [45].…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

Powalski¹,

Borchmann²,

Jurkiewicz³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

We address the challenging problem of Natural Language Comprehension beyond plain-text documents by introducing the TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics. Contrary to previous approaches, we rely on a decoder capable of unifying a variety of problems involving natural language. The layout is represented as an attention bias and complemented with contextualized visual information, while the core of our model is a pretrained encoder-decoder Transformer. Our novel approach achieves state-of-the-art results in extracting information from documents and answering questions which demand layout understanding (DocVQA, CORD, WikiOps, SROIE). At the same time, we simplify the process by employing an end-to-end model.

show abstract

Section: Limitations Of Sequence Labelingmentioning

confidence: 99%

Section: Limitations Of Sequence Labelingmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

Powalski¹,

Borchmann²,

Jurkiewicz³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The WikiReading dataset [8] (and its variant WikiReading Recycled [6]) is a large-scale natural language understanding task. Here, the main goal is to predict textual values from the structured knowledge base, Wikidata, by reading the text of the corresponding Wikipedia articles.…”

Section: Information Extraction From One-dimensional Documentsmentioning

confidence: 99%

“…This disparity is still large and makes a robust evaluation difficult. Recently, researchers have started to fill the gap by creating datasets in the KIE domain such as scanned receipts: SROIE 5 [18], form understanding [11], NIST Structured Forms Reference Set of Binary Images (SFRS ) 6 or Visual Question Answering dataset DocVQA [15].…”

Section: Introductionmentioning

confidence: 99%

Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts

Stanisławek¹,

Graliński²,

Wróblewska³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

The relevance of the Key Information Extraction (KIE) task is increasingly important in natural language processing problems. But there are still only a few well-defined problems that serve as benchmarks for solutions in this area. To bridge this gap, we introduce two new datasets (Kleister NDA and Kleister Charity). They involve a mix of scanned and born-digital long formal English-language documents. In these datasets, an NLP system is expected to find or infer various types of entities by employing both textual and structural layout features. The Kleister Charity dataset consists of 2,788 annual financial reports of charity organizations, with 61,643 unique pages and 21,612 entities to extract. The Kleister NDA dataset has 540 Non-disclosure Agreements, with 3,229 unique pages and 2,160 entities to extract. We provide several state-of-the-art baseline systems from the KIE domain (Flair, BERT, RoBERTa, LayoutLM, LAMBERT), which show that our datasets pose a strong challenge to existing models. The best model achieved an 81.77 % and an 83.57 % F1-score on respectively the Kleister NDA and the Kleister Charity datasets. We share the datasets to encourage progress on more in-depth and complex information extraction tasks.

show abstract

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

Powalski¹,

Borchmann

Jurkiewicz

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

From Dataset Recycling to Multi-Property Extraction and Beyond

Cited by 4 publications

References 19 publications

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

Kleister: Key Information Extraction Datasets Involving Long Documents with Complex Layouts

Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer

Contact Info

Product

Resources

About