Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022
DOI: 10.18653/v1/2022.acl-long.180
|View full text |Cite
|
Sign up to set email alerts
|

Text-to-Table: A New Way of Information Extraction

Abstract: We study a new problem setting of information extraction (IE), referred to as text-to-table . In text-to-table, given a text, one creates a table or several tables expressing the main content of the text, while the model is learned from text-table pair data. The problem setting differs from those of the existing methods for IE. First, the extraction can be carried out from long texts to large tables with complex structures. Second, the extraction is entirely data-driven, and there is no need to explicitly defi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
23
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 14 publications
(23 citation statements)
references
References 40 publications
0
23
0
Order By: Relevance
“…The input text is a report of a basketball game. In the existing dominant model (Wu et al, 2022), the output table is serialized into a token sequence during training by concatenating all rows in a top-down order. Here, ⟨s⟩ token is used to separate the cells of each row, ⟨n⟩ token is utilized to separate rows, and ⟨ ⟩ token means an empty cell.…”
Section: Data Cellsmentioning
confidence: 99%
See 4 more Smart Citations
“…The input text is a report of a basketball game. In the existing dominant model (Wu et al, 2022), the output table is serialized into a token sequence during training by concatenating all rows in a top-down order. Here, ⟨s⟩ token is used to separate the cells of each row, ⟨n⟩ token is utilized to separate rows, and ⟨ ⟩ token means an empty cell.…”
Section: Data Cellsmentioning
confidence: 99%
“…Here, ⟨s⟩ token is used to separate the cells of each row, ⟨n⟩ token is utilized to separate rows, and ⟨ ⟩ token means an empty cell. Unlike Wu et al (2022), in this work, we model the generation of each table as a table header and then a set of table body rows. Note that these rows can be further decomposed into the first column and data cells wrapped in the red box.…”
Section: Data Cellsmentioning
confidence: 99%
See 3 more Smart Citations