Proceedings of the 24th Conference on Computational Natural Language Learning 2020
DOI: 10.18653/v1/2020.conll-1.52
|View full text |Cite
|
Sign up to set email alerts
|

From Dataset Recycling to Multi-Property Extraction and Beyond

Abstract: This paper investigates various Transformer architectures on the WikiReading Information Extraction and Machine Reading Comprehension dataset. The proposed dual-source model outperforms the current state-of-theart by a large margin.Next, we introduce WikiReading Recycled-a newly developed public dataset, and the task of multipleproperty extraction. It uses the same data as WikiReading but does not inherit its predecessor's identified disadvantages. In addition, we provide a human-annotated test set with diagno… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2
2

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 19 publications
0
5
0
Order By: Relevance
“…Sequence labeling models can be trained in all cases where the token-level annotation is available or can be easily obtained. Limitations of this approach are strikingly visible on tasks framed in either key information extraction or property extraction paradigms [20,9]. Here, no annotated spans are available, and only property-value pairs are assigned to the document.…”
Section: Limitations Of Sequence Labelingmentioning
confidence: 99%
See 2 more Smart Citations
“…Sequence labeling models can be trained in all cases where the token-level annotation is available or can be easily obtained. Limitations of this approach are strikingly visible on tasks framed in either key information extraction or property extraction paradigms [20,9]. Here, no annotated spans are available, and only property-value pairs are assigned to the document.…”
Section: Limitations Of Sequence Labelingmentioning
confidence: 99%
“…Finally, the property extraction paradigm does not assume the requested value appeared in the article in any form since it is sufficient for it to be inferable from the content, as in document classification or non-extractive question answering [9].…”
Section: Limitations Of Sequence Labelingmentioning
confidence: 99%
See 1 more Smart Citation
“…The WikiReading dataset [8] (and its variant WikiReading Recycled [6]) is a large-scale natural language understanding task. Here, the main goal is to predict textual values from the structured knowledge base, Wikidata, by reading the text of the corresponding Wikipedia articles.…”
Section: Information Extraction From One-dimensional Documentsmentioning
confidence: 99%
“…This disparity is still large and makes a robust evaluation difficult. Recently, researchers have started to fill the gap by creating datasets in the KIE domain such as scanned receipts: SROIE 5 [18], form understanding [11], NIST Structured Forms Reference Set of Binary Images (SFRS ) 6 or Visual Question Answering dataset DocVQA [15].…”
Section: Introductionmentioning
confidence: 99%